AP Statistics — Midterm 2 Review

← MJ's Study Guides

📊 Unit 1 — Exploring Data

Comparing Distributions (SOCS)

🔠 The SOCS Framework

Shape — symmetric, skewed left/right, unimodal/bimodal
Outliers — mention if present; use IQR rule to identify
Center — median (or mean) with value and units
Spread — IQR, range, or standard deviation with value and units

📦 Boxplot Anatomy

Whiskers: min & max (excluding outliers)
Box edges: Q1 (25th %ile) and Q3 (75th %ile)
Line in box: Median (Q2)
IQR = Q3 − Q1
Outlier rule: value < Q1 − 1.5·IQR or > Q3 + 1.5·IQR

⚖️ Mean vs. Median

Symmetric distribution → mean ≈ median
Skewed right → mean > median (pulled toward tail)
Skewed left → mean < median
Outliers pull the mean but not the median
Median is resistant; mean is not

⚠️ AP Exam Language When comparing distributions, always use comparative language: "The median travel time for elementary students (≈ 25 min) is greater than the median for middle school students (≈ 15 min)." Never just state each group's value — you must compare them.

Normal Distribution

📐 Z-Score Formula

Z-Score

z = (x − μ) / σ

z tells you how many standard deviations above (+) or below (−) the mean a value falls.

🖩 Calculator Commands

Finding Area (Probability)

normalcdf(lower, upper, μ, σ)

Finding Value from %ile

invNorm(area, μ, σ)

🎯 Empirical Rule (68-95-99.7)

μ ± 1σ contains ≈ 68% of data
μ ± 2σ contains ≈ 95% of data
μ ± 3σ contains ≈ 99.7% of data

💡 Normal Distribution Problem Types

Find probability/proportion: convert to z-score, then use normalcdf or table → P(X < x)
Find value from percentile: use invNorm(percentile as decimal, μ, σ)
Find middle X%: the area outside each tail = (1 − X%)/2; use invNorm for each tail

📈 Unit 2 — Bivariate Data & Linear Regression

The Least-Squares Regression Line (LSRL)

Equation of LSRL

ŷ = a + bx or ŷ = b₀ + b₁x

where b = slope (from computer output "Coef" of x-variable)
a = y-intercept (from computer output "Constant" Coef)

📝 Interpreting the Slope

For each additional [one unit of x], the predicted [y variable] increases/decreases by [|b|] [y units] on average.

📝 Interpreting the Y-Intercept

When [x variable] is 0, the predicted [y variable] is [a] [y units].

Ask: does x = 0 make sense in context? If not, the y-intercept has limited practical meaning.

📊 Residuals

Residual = Observed − Predicted
e = y − ŷ

Positive residual → point is above the line (underestimated)
Negative residual → point is below the line (overestimated)

🔗 Correlation Coefficient (r)

Always between −1 and +1
r > 0: positive association
r < 0: negative association
|r| close to 1: strong linear relationship
r = √(R²) — take ± based on slope direction
r has no units; not affected by changing units

📐 Coefficient of Determination (R²)

Interpretation: R²% of the variation in [y] is explained by the linear relationship with [x].

The remaining (1 − R²)% is due to other factors or random variation.

📏 Standard Deviation of Residuals (s)

Interpretation: The actual [y] values typically differ from the predicted values by about s [y units].

Measures typical prediction error of the model.

Residual Plots & Model Appropriateness

✅ Linear Model IS Appropriate If:

Residual plot shows random scatter with no pattern
No curved (U-shaped or arch-shaped) pattern in residuals
R² is reasonably high
Scatterplot shows roughly linear pattern

❌ Linear Model NOT Appropriate If:

Residual plot shows a curved pattern
Residual plot shows a fan shape (increasing spread)
Scatterplot is clearly curved (exponential, quadratic)

⚠️ Outliers & Extrapolation

Outlier in regression: point far from the line (large residual)
High leverage: point with extreme x-value
Influential point: removing it changes the LSRL significantly
Extrapolation: predicting outside the range of x-data — unreliable, avoid!

💡 Computer Output Quick-Read

Find slope in "Coef" column, row of the x-variable name
Find y-intercept in "Coef" column, row labeled "Constant"
Find r² from "R-Sq" — then r = √(R²), sign matches slope
Find s labeled directly as "s = …"

🔬 Unit 3 — Collecting Data

Sampling Methods

Method	How It Works	Key Feature
Simple Random Sample (SRS)	Every individual & every group of n individuals has an equal chance of selection	Gold standard; unbiased if done correctly
Stratified Random Sample	Divide population into strata (homogeneous groups); take SRS from each stratum	More precise when strata differ on response variable
Cluster Sample	Divide into clusters (heterogeneous groups); randomly select entire clusters	Practical when population is spread out
Systematic Sample	Select every k-th individual from a list after random start	Easy to implement
Convenience Sample	Select whoever is easiest to reach	Very prone to bias — avoid!

Sources of Bias

🎭 Voluntary Response Bias

People choose to respond; those with strong opinions are overrepresented.

Example: online poll where only motivated people participate

🚪 Convenience Bias

Sampling whoever is nearby; sample may not represent the population.

Example: surveying only football game attendees

💬 Question Wording Bias

Leading or loaded questions push respondents toward a particular answer.

Example: "Do you support the dangerously high-crime prison construction?"

📭 Non-Response Bias

People who don't respond differ systematically from those who do.

🕵️ Undercoverage

Some groups in the population have a lower probability of being included in the sample.

🗣️ Response Bias

People give inaccurate answers (social desirability, interviewer effect).

Experimental Design

🧪 Key Vocabulary

Experimental units: the individuals being studied
Factor: an explanatory variable (manipulated)
Level: specific value of a factor
Treatment: specific combination of factor levels applied
Response variable: the outcome measured
Placebo: fake treatment that looks real

🎯 Principles of Experiment Design

Randomization: randomly assign units to treatments to reduce confounding
Replication: apply each treatment to enough units to detect real effects
Control: control for extraneous variables (use a control group or placebo)

🚧 Control Group

A group that receives no treatment (or placebo). Advantage: shows what changes occur without the treatment, giving a baseline for comparison.

🧱 Blocking

Group similar experimental units into blocks, then randomly assign treatments within each block.

Use blocking variable that is most related to the response variable — it reduces variability and makes comparisons more precise.

💡 Stratification vs. Blocking Stratification is used in sampling (observational). Blocking is used in experiments. Both involve grouping similar individuals — the goal is to reduce variability within groups.

✅ Stratified Sampling: When is campus better than gender? Stratify by campus when students' satisfaction with buildings differs more by campus than by gender — i.e., campus explains more variation in the response than gender does.

🎲 Unit 4 — Probability & Random Variables

Basic Probability Rules

Core Rules

0 ≤ P(A) ≤ 1                     (probability is between 0 and 1)
P(Aᶜ) = 1 − P(A)               (complement rule)
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) (general addition rule)
P(A ∩ B) = P(A) · P(B | A)       (general multiplication rule)
P(B | A) = P(A ∩ B) / P(A)       (conditional probability)

🔒 Mutually Exclusive (Disjoint)

Events A and B cannot both occur
P(A ∩ B) = 0
P(A ∪ B) = P(A) + P(B)
Mutually exclusive events are NOT independent (unless P = 0)

🔀 Independence

Knowing A occurred doesn't change probability of B
P(A | B) = P(A) — or equivalently:
P(A ∩ B) = P(A) · P(B)
Check: does P(A|B) = P(A)? If yes → independent

📋 Two-Way Tables

Joint probability: P(A and B) = cell / table total
Marginal probability: P(A) = row or column total / table total
Conditional probability: P(A | B) = cell / row or column total

❌ Common Mistake "Mutually exclusive" and "independent" are NOT the same. If two events are mutually exclusive and both have positive probability, they are DEPENDENT (knowing one occurred means the other definitely did not).

Discrete Random Variables

Expected Value and Standard Deviation

E(X) = μ_X = Σ [x · P(x)]
Var(X) = σ²_X = Σ [(x − μ)² · P(x)]
SD(X) = σ_X = √Var(X)

Combining Random Variables

Rules for Combining (X and Y independent)

E(X ± Y) = E(X) ± E(Y)
Var(X ± Y) = Var(X) + Var(Y) ← variances ADD even for differences!

E(aX + b) = a · E(X) + b
Var(aX + b) = a² · Var(X)
SD(aX + b) = |a| · SD(X)

⚠️ Critical Rule When combining independent random variables, VARIANCES always add (even for X − Y). Standard deviations do NOT add — only variances do. Always add variances first, then take the square root.

Binomial Distribution B(n, p)

✅ BINS Conditions

Binary — two outcomes (success/failure)
Independent — trials are independent
Number — fixed number of trials (n)
Success — constant probability p each trial

📐 Binomial Formulas

P(X = k) = C(n,k) · pᵏ · (1−p)^(n−k)

μ_X = np
σ_X = √(np(1−p))

🖩 Calculator: Binomial

Exactly k successes

binompdf(n, p, k)

At most k successes: P(X ≤ k)

binomcdf(n, p, k)

At least k successes: P(X ≥ k)

1 − binomcdf(n, p, k−1)

Geometric Distribution G(p)

🎯 When to Use Geometric

Count the number of trials until the first success. Same BINS conditions except no fixed n.

📐 Geometric Formulas

P(X = k) = (1−p)^(k−1) · p

μ_X = 1/p (expected # of trials)

P(X > k) = (1−p)^k

🖩 Calculator: Geometric

Exactly k trials until first success

geometpdf(p, k)

At most k trials: P(X ≤ k)

geometcdf(p, k)

Sampling Distributions & Central Limit Theorem

📊 Sampling Distribution of x̄

μ_x̄ = μ
σ_x̄ = σ / √n (standard error of the mean)

By CLT: for large n (≥ 30), x̄ is approximately Normal regardless of population shape.

💡 Effect of Sample Size

Larger n → smaller σ_x̄ → x̄ is less variable
Larger n → sampling distribution is more Normal
Averaging reduces variability by factor of √n

💡 Binomial vs. Geometric — Quick Check

Binomial: "How many successes in n trials?" → fixed n, count successes
Geometric: "How many trials until the first success?" → no fixed n, count trials

🃏 Flashcards

Click any card to flip it and reveal the answer.

Click to reveal

SOCS

Shape, Outliers, Center, Spread — the four things to describe/compare when analyzing a distribution.

Click to reveal

Outlier Rule (IQR)

A value is an outlier if it falls below Q1 − 1.5·IQR or above Q3 + 1.5·IQR.

Click to reveal

Residual

Residual = Observed − Predicted (y − ŷ). Positive → above the line (underestimate). Negative → below the line (overestimate).

Click to reveal

Interpreting R²

R²% of the variation in [y] is explained by the linear relationship with [x]. The rest is due to other factors.

Click to reveal

Interpreting s (residual SD)

The actual [y] values typically differ from the predicted values by about s [y units]. It measures the typical prediction error of the LSRL.

Click to reveal

SRS vs. Stratified Sample

SRS: every group of n has equal chance. Stratified: divide into homogeneous strata, take SRS from each. Stratified is more precise when strata differ on the response.

Click to reveal

Convenience Sample Bias

Using whoever is easiest to reach. Biased because the sample may systematically differ from the population (e.g., football fans ≠ all students).

Click to reveal

Purpose of Blocking

Grouping similar experimental units to reduce variability within groups. Use the variable most related to the response as the blocking variable for maximum benefit.

Click to reveal

Mutually Exclusive vs. Independent

Mutually exclusive: P(A∩B) = 0, can't both happen. Independent: P(A|B) = P(A), knowing B doesn't change P(A). If ME and P > 0, they are dependent.

Click to reveal

BINS (Binomial Conditions)

Binary outcomes, Independent trials, Number of trials fixed, Same probability p each trial.

Click to reveal

Geometric Mean (Expected Value)

E(X) = 1/p. On average, it takes 1/p trials to get the first success in a geometric setting.

Click to reveal

Combining Variances Rule

Var(X ± Y) = Var(X) + Var(Y) — variances always ADD (even for subtraction) when X and Y are independent. Never subtract variances!

Click to reveal

Central Limit Theorem

For large samples (n ≥ 30), the sampling distribution of x̄ is approximately Normal with mean μ and standard deviation σ/√n, regardless of the population's shape.

Click to reveal

Conditional Probability Formula

P(A | B) = P(A ∩ B) / P(B). The probability of A given that B has already occurred.

Click to reveal

Extrapolation Warning

Extrapolation means predicting y for x-values outside the range of the data. The LSRL may not apply — predictions are unreliable and potentially misleading.

Click to reveal

Binomial Mean & SD

Mean: μ = np. Standard deviation: σ = √(np(1−p)). Where n = number of trials, p = probability of success.

✏️ Practice Quiz (40 Questions)

Click an answer to check it. Use "Show Answer" if you want to reveal it without guessing.

⚡ Rapid Review Sheet

📊 Unit 1 — Distributions

Use SOCS to compare distributions
Always use comparative language when comparing
Outlier: < Q1 − 1.5·IQR or > Q3 + 1.5·IQR
Skewed right → mean > median
Z-score = (x − μ) / σ
normalcdf(L, U, μ, σ) for P(L < X < U)
invNorm(area, μ, σ) for value at percentile
68% / 95% / 99.7% within 1/2/3σ of mean

📈 Unit 2 — Regression

Slope: for each +1 x-unit, ŷ changes by b y-units on avg
Y-int: predicted y when x = 0
Residual = Observed − Predicted
r from output: r = ±√(R²), sign matches slope
R²: % variation in y explained by linear model with x
s: typical distance actual y is from predicted ŷ
Linear model OK if residual plot is random scatter
Don't extrapolate outside range of data

🔬 Unit 3 — Data Collection

SRS: every individual & group equally likely to be chosen
Stratified: SRS within homogeneous groups
Cluster: select entire groups at random
Convenience sample → biased!
Experiment: researcher assigns treatments
3 Principles: Randomization, Replication, Control
Control group shows effect without treatment
Block: group similar units; assign treatments within blocks

🎲 Unit 4 — Probability

P(Aᶜ) = 1 − P(A)
P(A∪B) = P(A) + P(B) − P(A∩B)
P(A|B) = P(A∩B) / P(B)
Independent: P(A|B) = P(A) or P(A∩B) = P(A)·P(B)
Mutually exclusive: P(A∩B) = 0
E(X) = Σ[x·P(x)]
Var(X±Y) = Var(X) + Var(Y) [if independent]
E(aX+b) = a·E(X)+b; Var(aX+b) = a²·Var(X)

📊 Binomial B(n, p)

BINS: Binary, Independent, Number fixed, Same p
P(X=k) = C(n,k)·pᵏ·(1−p)^(n−k)
μ = np
σ = √(np(1−p))
binompdf(n,p,k) → P(X=k)
binomcdf(n,p,k) → P(X≤k)
P(X≥k) = 1 − binomcdf(n,p,k−1)

🎯 Geometric G(p)

Count trials until FIRST success
P(X=k) = (1−p)^(k−1)·p
E(X) = 1/p
P(X > k) = (1−p)^k
geometpdf(p,k) → P(X=k)
geometcdf(p,k) → P(X≤k)
"At least k trials" = 1 − geometcdf(p, k−1)

🔔 Sampling Distributions

μ_x̄ = μ (mean of x̄ = population mean)
σ_x̄ = σ/√n (standard error)
CLT: large n → x̄ ~ Normal(μ, σ/√n)
Larger n → less variability in x̄
Larger n → less likely to get extreme x̄
P(x̄ > c): use normalcdf with σ/√n

⚠️ Most Common AP Mistakes

Subtracting variances (ALWAYS add variances)
Not including context in interpretations
Saying "correlation" when describing slope/causation
Extrapolating beyond data range
Confusing mutually exclusive with independent
Forgetting "on average" when interpreting slope
Using SD of single obs instead of σ/√n for x̄