EDC—State Partnership

Statistical Terms Dictionary

Categorical Variable
A variable is categorical if its values fall into a distinct set of categories that do not overlap. For example, patient sex can take on the values of male or female. First treatment provided might have the values of ‘IV line’ and ‘Airway inserted,’ among others (also nominal).

Confidence Intervals
The upper and lower boundaries that one is X percent sure the estimate falls within (as in 95% confidence limits). See Confidence Intervals in Advanced Statistical Topics.

Continuous Variable
A variable that can take on any value. For example, height, weight, temperature, the amount of sugar in an orange, and the time required to run a mile are all continuous variables.

Cross Tabulations
Comparing two or more variables of data. For example, you might want to see how many observations occur by age, gender, city, etc.

Descriptive Statistics
Statistics used to summarize a body of data

Numerical designations of how closely data cluster about the mean or other measure of central tendency.

Frequency Checks
Creating a table that shows a body of data grouped according to numerical values.

A bar graph representing a frequency distribution.

Hypothesis Testing
See Hypothesis Testing in Advanced Statistical Topics.

Independent Samples
Independent samples are two or more samples selected from the same population, or different populations, that have no effect on one another. The outcome for one sample is assumed to be unrelated to the outcomes for each of the other samples. Or to restate the same principle, if you know the outcome for one sample, it will provide you with no information about the outcome for the other sample. Examples range from comparing males and females as two independent samples within a population to comparing a treatment group to a control group in an interventional study.

Independent Observations
Two observations are independent if the occurrence of one observation provides no information about the occurrence of the other observation. A simple example is measuring the height of everyone in your sample at a single point in time. These should be unrelated observations. However, if you were to measure one child’s height over time, these observations would be dependent because the height at each time point would affect the height at future time points.

Independent Variable
The variable that causes or predicts the dependent variable. Also called the explanatory variable.

Inferential Statistics
Using sample statistics to infer characteristics about the population.

Matched Pairs and Repeated Measures
Matched samples can arise in the following situations:

  • Two samples in which the members are clearly paired or are matched explicitly by the researcher. For example, IQ measurements on pairs of identical twins.
  • Those samples in which the same attribute, or variable, is measured twice on each subject, under different circumstances. Commonly called repeated measures. Examples include the times of a group of athletes for 1500m before and after a week of special training; or the milk yields of cows before and after being fed a particular diet.

Sometimes, the difference in the value of the measurement of interest for each matched pair is calculated, for example, the difference between before and after measurements, and these figures then form a single sample for an appropriate statistical analysis.

Measures of Center
Statistics designed to represent the average or middle in a distribution of data.

Minimum value
The smallest observation in a set of data.

Maximum value
The largest observation in a set of data.

The arithmetic average for a group of data.

The middle item in a group of data when the data are ranked in order of magnitude.

The most common value in any distribution.

Normal Distribution
A bell-shaped curve or distribution indicating that observations at or close to the mean occur with highest probability, and that the probability of occurrence progressively decreases as observations deviate from the mean.

Data points in a given data set.

Ordinal Variable
An ordinal variable has categories that can be ranked or ordered. However, the difference between levels may not be the same. For example, if you are administering a survey and ask the question, “How important do you think a primary seat belt law is?” you might have the following responses: ‘Very important’ ‘Somewhat important’ ‘Not very important’. These responses have an obvious order, however, the difference between very important and somewhat important may not be the same as the difference between somewhat important and not very important.

An extreme value in a frequency distribution; can have a disproportionate influence on the mean.

A measure used to summarize characteristics of a population based on all items in the population (such as a population mean).

The total set of items that one wants to analyze (all children, all citizens of a city, etc.).

Probabilistic Linkage
See Probabilistic Linkage in Advanced Statistical Topics.

Expresses the likelihood of a given even occurring over the long term.

Each subject has an equal chance of being selected. In research design, random assignment is a procedure that gives each subject an equal chance of placement in the experimental group or the control group so that now systematic difference exists between the groups prior to administration of the treatment.

A measure of dispersion calculated by subtracting the smallest value in a distribution from the largest value.

Repeated Measures
See Matched Pairs and Repeated Measures.
Response Variable (outcome, dependent)
The variable that is caused or predicted by the independent variable.

A subset of the population usually selected randomly. Measures that summarize a sample are called sample statistics.

Data are considered skewed when most of the data values fall to the left or the right of the mean.

How spread out the data fall about the mean.

Single Sample Statistic
A single sample statistic is one in which you are only interested in describing one population. You are not interested in comparing different sub-populations within the single sample. For example, you want to describe the intubation rate for all pediatric patients.

Standard Deviation
A measure of dispersion; the square root of the average squared deviation from the mean. See Standard Deviation under Basic Statistics.

A measure that is used to summarize a sample of data.

Survival Outcome
Survival refers to time-to-event data. For example, the time from an injury to normal functioning or the time from onset of disease to death. An important characteristic of survival outcomes is that they generally include censored observations, cases in which you do not observe the outcome of interest because either the study ends, people move, are discharged from the hospital, or are otherwise lost to follow up before their outcome occurs. Instead of these data being excluded, survival analysis can use censored observations in the analysis of your outcome.

Target Population
The population a given study is intended to reach.

The average squared deviation from the mean; the square of the standard deviation.