AP Statistics

Midterm 2 Review

Complete study guide covering Units 1–4: Data Analysis, Regression, Data Collection, and Probability

Unit 1 · Exploring Data Unit 2 · Bivariate Data Unit 3 · Data Collection Unit 4 · Probability & Random Variables
← MJ's Study Guides
📊 Unit 1 — Exploring Data
Comparing Distributions (SOCS)

🔠 The SOCS Framework

  • Shape — symmetric, skewed left/right, unimodal/bimodal
  • Outliers — mention if present; use IQR rule to identify
  • Center — median (or mean) with value and units
  • Spread — IQR, range, or standard deviation with value and units

📦 Boxplot Anatomy

  • Whiskers: min & max (excluding outliers)
  • Box edges: Q1 (25th %ile) and Q3 (75th %ile)
  • Line in box: Median (Q2)
  • IQR = Q3 − Q1
  • Outlier rule: value < Q1 − 1.5·IQR or > Q3 + 1.5·IQR

⚖️ Mean vs. Median

  • Symmetric distribution → mean ≈ median
  • Skewed right → mean > median (pulled toward tail)
  • Skewed left → mean < median
  • Outliers pull the mean but not the median
  • Median is resistant; mean is not
⚠️ AP Exam Language When comparing distributions, always use comparative language: "The median travel time for elementary students (≈ 25 min) is greater than the median for middle school students (≈ 15 min)." Never just state each group's value — you must compare them.
Normal Distribution

📐 Z-Score Formula

Z-Score
z = (x − μ) / σ

z tells you how many standard deviations above (+) or below (−) the mean a value falls.

🖩 Calculator Commands

Finding Area (Probability)
normalcdf(lower, upper, μ, σ)

Finding Value from %ile
invNorm(area, μ, σ)

🎯 Empirical Rule (68-95-99.7)

  • μ ± 1σ contains ≈ 68% of data
  • μ ± 2σ contains ≈ 95% of data
  • μ ± 3σ contains ≈ 99.7% of data
💡 Normal Distribution Problem Types
  • Find probability/proportion: convert to z-score, then use normalcdf or table → P(X < x)
  • Find value from percentile: use invNorm(percentile as decimal, μ, σ)
  • Find middle X%: the area outside each tail = (1 − X%)/2; use invNorm for each tail
📈 Unit 2 — Bivariate Data & Linear Regression
The Least-Squares Regression Line (LSRL)
Equation of LSRL
ŷ = a + bx    or    ŷ = b₀ + b₁x

where b = slope (from computer output "Coef" of x-variable)
       a = y-intercept (from computer output "Constant" Coef)

📝 Interpreting the Slope

For each additional [one unit of x], the predicted [y variable] increases/decreases by [|b|] [y units] on average.

📝 Interpreting the Y-Intercept

When [x variable] is 0, the predicted [y variable] is [a] [y units].

Ask: does x = 0 make sense in context? If not, the y-intercept has limited practical meaning.

📊 Residuals

Residual = Observed − Predicted
e = y − ŷ
  • Positive residual → point is above the line (underestimated)
  • Negative residual → point is below the line (overestimated)

🔗 Correlation Coefficient (r)

  • Always between −1 and +1
  • r > 0: positive association
  • r < 0: negative association
  • |r| close to 1: strong linear relationship
  • r = √(R²) — take ± based on slope direction
  • r has no units; not affected by changing units

📐 Coefficient of Determination (R²)

Interpretation: R²% of the variation in [y] is explained by the linear relationship with [x].

The remaining (1 − R²)% is due to other factors or random variation.

📏 Standard Deviation of Residuals (s)

Interpretation: The actual [y] values typically differ from the predicted values by about s [y units].

Measures typical prediction error of the model.

Residual Plots & Model Appropriateness

✅ Linear Model IS Appropriate If:

  • Residual plot shows random scatter with no pattern
  • No curved (U-shaped or arch-shaped) pattern in residuals
  • R² is reasonably high
  • Scatterplot shows roughly linear pattern

❌ Linear Model NOT Appropriate If:

  • Residual plot shows a curved pattern
  • Residual plot shows a fan shape (increasing spread)
  • Scatterplot is clearly curved (exponential, quadratic)

⚠️ Outliers & Extrapolation

  • Outlier in regression: point far from the line (large residual)
  • High leverage: point with extreme x-value
  • Influential point: removing it changes the LSRL significantly
  • Extrapolation: predicting outside the range of x-data — unreliable, avoid!
💡 Computer Output Quick-Read
  • Find slope in "Coef" column, row of the x-variable name
  • Find y-intercept in "Coef" column, row labeled "Constant"
  • Find from "R-Sq" — then r = √(R²), sign matches slope
  • Find s labeled directly as "s = …"
🔬 Unit 3 — Collecting Data
Sampling Methods
MethodHow It WorksKey Feature
Simple Random Sample (SRS)Every individual & every group of n individuals has an equal chance of selectionGold standard; unbiased if done correctly
Stratified Random SampleDivide population into strata (homogeneous groups); take SRS from each stratumMore precise when strata differ on response variable
Cluster SampleDivide into clusters (heterogeneous groups); randomly select entire clustersPractical when population is spread out
Systematic SampleSelect every k-th individual from a list after random startEasy to implement
Convenience SampleSelect whoever is easiest to reachVery prone to bias — avoid!
Sources of Bias

🎭 Voluntary Response Bias

People choose to respond; those with strong opinions are overrepresented.

Example: online poll where only motivated people participate

🚪 Convenience Bias

Sampling whoever is nearby; sample may not represent the population.

Example: surveying only football game attendees

💬 Question Wording Bias

Leading or loaded questions push respondents toward a particular answer.

Example: "Do you support the dangerously high-crime prison construction?"

📭 Non-Response Bias

People who don't respond differ systematically from those who do.

🕵️ Undercoverage

Some groups in the population have a lower probability of being included in the sample.

🗣️ Response Bias

People give inaccurate answers (social desirability, interviewer effect).

Experimental Design

🧪 Key Vocabulary

  • Experimental units: the individuals being studied
  • Factor: an explanatory variable (manipulated)
  • Level: specific value of a factor
  • Treatment: specific combination of factor levels applied
  • Response variable: the outcome measured
  • Placebo: fake treatment that looks real

🎯 Principles of Experiment Design

  • Randomization: randomly assign units to treatments to reduce confounding
  • Replication: apply each treatment to enough units to detect real effects
  • Control: control for extraneous variables (use a control group or placebo)

🚧 Control Group

A group that receives no treatment (or placebo). Advantage: shows what changes occur without the treatment, giving a baseline for comparison.

🧱 Blocking

Group similar experimental units into blocks, then randomly assign treatments within each block.

Use blocking variable that is most related to the response variable — it reduces variability and makes comparisons more precise.

💡 Stratification vs. Blocking Stratification is used in sampling (observational). Blocking is used in experiments. Both involve grouping similar individuals — the goal is to reduce variability within groups.
✅ Stratified Sampling: When is campus better than gender? Stratify by campus when students' satisfaction with buildings differs more by campus than by gender — i.e., campus explains more variation in the response than gender does.
🎲 Unit 4 — Probability & Random Variables
Basic Probability Rules
Core Rules
0 ≤ P(A) ≤ 1                     (probability is between 0 and 1)
P(Aᶜ) = 1 − P(A)               (complement rule)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)   (general addition rule)
P(A ∩ B) = P(A) · P(B | A)       (general multiplication rule)
P(B | A) = P(A ∩ B) / P(A)       (conditional probability)

🔒 Mutually Exclusive (Disjoint)

  • Events A and B cannot both occur
  • P(A ∩ B) = 0
  • P(A ∪ B) = P(A) + P(B)
  • Mutually exclusive events are NOT independent (unless P = 0)

🔀 Independence

  • Knowing A occurred doesn't change probability of B
  • P(A | B) = P(A) — or equivalently:
  • P(A ∩ B) = P(A) · P(B)
  • Check: does P(A|B) = P(A)? If yes → independent

📋 Two-Way Tables

  • Joint probability: P(A and B) = cell / table total
  • Marginal probability: P(A) = row or column total / table total
  • Conditional probability: P(A | B) = cell / row or column total
❌ Common Mistake "Mutually exclusive" and "independent" are NOT the same. If two events are mutually exclusive and both have positive probability, they are DEPENDENT (knowing one occurred means the other definitely did not).
Discrete Random Variables
Expected Value and Standard Deviation
E(X) = μ_X = Σ [x · P(x)]
Var(X) = σ²_X = Σ [(x − μ)² · P(x)]
SD(X) = σ_X = √Var(X)
Combining Random Variables
Rules for Combining (X and Y independent)
E(X ± Y) = E(X) ± E(Y)
Var(X ± Y) = Var(X) + Var(Y)     ← variances ADD even for differences!

E(aX + b) = a · E(X) + b
Var(aX + b) = a² · Var(X)
SD(aX + b) = |a| · SD(X)
⚠️ Critical Rule When combining independent random variables, VARIANCES always add (even for X − Y). Standard deviations do NOT add — only variances do. Always add variances first, then take the square root.
Binomial Distribution B(n, p)

✅ BINS Conditions

  • Binary — two outcomes (success/failure)
  • Independent — trials are independent
  • Number — fixed number of trials (n)
  • Success — constant probability p each trial

📐 Binomial Formulas

P(X = k) = C(n,k) · pᵏ · (1−p)^(n−k)

μ_X = np
σ_X = √(np(1−p))

🖩 Calculator: Binomial

Exactly k successes
binompdf(n, p, k)

At most k successes: P(X ≤ k)
binomcdf(n, p, k)

At least k successes: P(X ≥ k)
1 − binomcdf(n, p, k−1)
Geometric Distribution G(p)

🎯 When to Use Geometric

Count the number of trials until the first success. Same BINS conditions except no fixed n.

📐 Geometric Formulas

P(X = k) = (1−p)^(k−1) · p

μ_X = 1/p    (expected # of trials)

P(X > k) = (1−p)^k

🖩 Calculator: Geometric

Exactly k trials until first success
geometpdf(p, k)

At most k trials: P(X ≤ k)
geometcdf(p, k)
Sampling Distributions & Central Limit Theorem

📊 Sampling Distribution of x̄

μ_x̄ = μ
σ_x̄ = σ / √n    (standard error of the mean)

By CLT: for large n (≥ 30), x̄ is approximately Normal regardless of population shape.

💡 Effect of Sample Size

  • Larger n → smaller σ_x̄ → x̄ is less variable
  • Larger n → sampling distribution is more Normal
  • Averaging reduces variability by factor of √n
💡 Binomial vs. Geometric — Quick Check
  • Binomial: "How many successes in n trials?" → fixed n, count successes
  • Geometric: "How many trials until the first success?" → no fixed n, count trials
🃏 Flashcards

Click any card to flip it and reveal the answer.

Click to reveal
SOCS
Shape, Outliers, Center, Spread — the four things to describe/compare when analyzing a distribution.
Click to reveal
Outlier Rule (IQR)
A value is an outlier if it falls below Q1 − 1.5·IQR or above Q3 + 1.5·IQR.
Click to reveal
Residual
Residual = Observed − Predicted (y − ŷ). Positive → above the line (underestimate). Negative → below the line (overestimate).
Click to reveal
Interpreting R²
R²% of the variation in [y] is explained by the linear relationship with [x]. The rest is due to other factors.
Click to reveal
Interpreting s (residual SD)
The actual [y] values typically differ from the predicted values by about s [y units]. It measures the typical prediction error of the LSRL.
Click to reveal
SRS vs. Stratified Sample
SRS: every group of n has equal chance. Stratified: divide into homogeneous strata, take SRS from each. Stratified is more precise when strata differ on the response.
Click to reveal
Convenience Sample Bias
Using whoever is easiest to reach. Biased because the sample may systematically differ from the population (e.g., football fans ≠ all students).
Click to reveal
Purpose of Blocking
Grouping similar experimental units to reduce variability within groups. Use the variable most related to the response as the blocking variable for maximum benefit.
Click to reveal
Mutually Exclusive vs. Independent
Mutually exclusive: P(A∩B) = 0, can't both happen. Independent: P(A|B) = P(A), knowing B doesn't change P(A). If ME and P > 0, they are dependent.
Click to reveal
BINS (Binomial Conditions)
Binary outcomes, Independent trials, Number of trials fixed, Same probability p each trial.
Click to reveal
Geometric Mean (Expected Value)
E(X) = 1/p. On average, it takes 1/p trials to get the first success in a geometric setting.
Click to reveal
Combining Variances Rule
Var(X ± Y) = Var(X) + Var(Y) — variances always ADD (even for subtraction) when X and Y are independent. Never subtract variances!
Click to reveal
Central Limit Theorem
For large samples (n ≥ 30), the sampling distribution of x̄ is approximately Normal with mean μ and standard deviation σ/√n, regardless of the population's shape.
Click to reveal
Conditional Probability Formula
P(A | B) = P(A ∩ B) / P(B). The probability of A given that B has already occurred.
Click to reveal
Extrapolation Warning
Extrapolation means predicting y for x-values outside the range of the data. The LSRL may not apply — predictions are unreliable and potentially misleading.
Click to reveal
Binomial Mean & SD
Mean: μ = np. Standard deviation: σ = √(np(1−p)). Where n = number of trials, p = probability of success.
✏️ Practice Quiz (40 Questions)

Click an answer to check it. Use "Show Answer" if you want to reveal it without guessing.

⚡ Rapid Review Sheet

📊 Unit 1 — Distributions

  • Use SOCS to compare distributions
  • Always use comparative language when comparing
  • Outlier: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
  • Skewed right → mean > median
  • Z-score = (x − μ) / σ
  • normalcdf(L, U, μ, σ) for P(L < X < U)
  • invNorm(area, μ, σ) for value at percentile
  • 68% / 95% / 99.7% within 1/2/3σ of mean

📈 Unit 2 — Regression

  • Slope: for each +1 x-unit, ŷ changes by b y-units on avg
  • Y-int: predicted y when x = 0
  • Residual = Observed − Predicted
  • r from output: r = ±√(R²), sign matches slope
  • R²: % variation in y explained by linear model with x
  • s: typical distance actual y is from predicted ŷ
  • Linear model OK if residual plot is random scatter
  • Don't extrapolate outside range of data

🔬 Unit 3 — Data Collection

  • SRS: every individual & group equally likely to be chosen
  • Stratified: SRS within homogeneous groups
  • Cluster: select entire groups at random
  • Convenience sample → biased!
  • Experiment: researcher assigns treatments
  • 3 Principles: Randomization, Replication, Control
  • Control group shows effect without treatment
  • Block: group similar units; assign treatments within blocks

🎲 Unit 4 — Probability

  • P(Aᶜ) = 1 − P(A)
  • P(A∪B) = P(A) + P(B) − P(A∩B)
  • P(A|B) = P(A∩B) / P(B)
  • Independent: P(A|B) = P(A) or P(A∩B) = P(A)·P(B)
  • Mutually exclusive: P(A∩B) = 0
  • E(X) = Σ[x·P(x)]
  • Var(X±Y) = Var(X) + Var(Y) [if independent]
  • E(aX+b) = a·E(X)+b; Var(aX+b) = a²·Var(X)

📊 Binomial B(n, p)

  • BINS: Binary, Independent, Number fixed, Same p
  • P(X=k) = C(n,k)·pᵏ·(1−p)^(n−k)
  • μ = np
  • σ = √(np(1−p))
  • binompdf(n,p,k) → P(X=k)
  • binomcdf(n,p,k) → P(X≤k)
  • P(X≥k) = 1 − binomcdf(n,p,k−1)

🎯 Geometric G(p)

  • Count trials until FIRST success
  • P(X=k) = (1−p)^(k−1)·p
  • E(X) = 1/p
  • P(X > k) = (1−p)^k
  • geometpdf(p,k) → P(X=k)
  • geometcdf(p,k) → P(X≤k)
  • "At least k trials" = 1 − geometcdf(p, k−1)

🔔 Sampling Distributions

  • μ_x̄ = μ (mean of x̄ = population mean)
  • σ_x̄ = σ/√n (standard error)
  • CLT: large n → x̄ ~ Normal(μ, σ/√n)
  • Larger n → less variability in x̄
  • Larger n → less likely to get extreme x̄
  • P(x̄ > c): use normalcdf with σ/√n

⚠️ Most Common AP Mistakes

  • Subtracting variances (ALWAYS add variances)
  • Not including context in interpretations
  • Saying "correlation" when describing slope/causation
  • Extrapolating beyond data range
  • Confusing mutually exclusive with independent
  • Forgetting "on average" when interpreting slope
  • Using SD of single obs instead of σ/√n for x̄