Recently, I had to brush up on statistics terms for a data analyst exam. I had trouble pulling together old course notes to create a quick, cohesive study guide. Below, overarching concepts are from the test’s public posting and my notes are derived from a quantitative statistics textbook.

**Central Tendency**= the average of a distribution*Mean*

= a distributionâ€™s midpoint*Median*

= the variable which occurs most often in a distribution*Mode*

**Variability**- The distribution of data, also known as spread

: Minimum, Q1, Median (Q2), Q3, Maximum*Five-number summary*- Represented through boxplots graphically

- Summarized through quartiles:
: median of all values to left of Q2*Q1*

: median (50*Q2*^{th}percentile) of all values in distribution

: median of all values to right of Q2*Q3*

**Variance**: s^2 = SUM((value minus mean)^2 for all values)) / (number of values-1)

**Standard deviation**: square root of the variance (s^2)

**Normal Distribution**: bell-curve distribution of data**Hypothesis Testing**: Examine evidence against a null hypothesis, hypotheses referring to populations or models and not a certain outcome- Compare claims

: statement challenged in significance testing*Null hypothesis**Example*: There is not a difference between means.

: statement suspected as true instead of the null hypothesis*Alternative hypothesis**Example*: The means are not the same.

- Accept or reject null hypothesis based on a certain p-value.
: the likelihood that the test statistic would be a value equal or higher than what is observed*p-value*- Smaller p-values signify stronger evidence against the null hypothesis in question. Often, an alpha value of 0.05 is used. Evidence would be so strong that something outside the p-value should only occur 5 out of every 100 times.

**Statistical Significance Testing**: Achieved at the level where the p-value is equal or less than alpha.**Probability**: The proportion of times an outcome would occur given many repeated tests.**Correlation**- A measure of the linear relationship between two quantitative variables, based on direction and strength.

- Examples: strong, weak, or no correlation; positive or negative

- Represented by r

- r = (1/n-1)*SUM((all x-values minus mean summed/standard deviation of all x-values),(all y-values minus mean summed/standard deviation of all y-values))

**Regression**: statistical model where the means of y occur on a line when plotted against x for one explanatory variable*Simple linear*

: statistical model with more than one explanatory variable*Multiple linear*

**Parametric Statistics**: Use numerical data because this assumes data has a normal distribution.**Nonparametric statistics**: Use ordinal or categorical data because this does not assume a normal distribution.**Analysis of Variance (ANOVAs)**: Compare population means based on 1 independent variable*One-way*

: Compare population means classified based on 2 independent variables*Two-way*

Source:

Moore, D. S., McCabe, G. P., & Craig, B. A. (2012). *Introduction to the practice of statistics*. Seventh edition/Student edition. New York: W.H. Freeman and Company, a Macmillan Higher Education Company.

*See **here** for an updated version* *of textbook*