ANOVA

Analysis of variance โ€” comparing means across three or more groups by partitioning total variability into between-group and within-group components.

Diagram ยท Group Distributions

Drag the sliders to move group means. ANOVA asks: is the spread between groups larger than the spread within groups?

Group 1: 2.0
Group 2: 5.0
Group 3: 8.0
grand meanฮผ1ฮผ2ฮผ3F โ‰ˆ 56.3
Between-group SS: 180.0Within-group SS: 43.2F = 56.25 โ†’ reject Hโ‚€
Definition

ANOVA (Analysis of Variance) tests whether the means of three or more groups are all equal.

H0H_0: ฮผ1=ฮผ2=โ‹ฏ=ฮผk\mu_1 = \mu_2 = \cdots = \mu_k (all group means are equal)

H1H_1: at least one mean differs

ANOVA decomposes total variability in data into:

  • Between-group variance: how much group means vary from the grand mean
  • Within-group variance: how much individual observations vary within their group

If between-group variance is much larger than within-group variance, we have evidence against H0H_0.

Key properties
  • Total sum of squares always splits exactly into between-group and within-group components โ€” nothing is lost or double-counted
  • Under H0H_0, the F-statistic has expectation close to 1, since both numerator and denominator estimate the same noise variance
  • A significant ANOVA result only establishes that some mean differs, not which pairs
  • Assumes normality within groups, equal variances across groups, and independence of observations
Common mistakes
  • Running multiple pairwise t-tests instead of post-hoc corrected tests: this reintroduces the multiple-testing problem that ANOVA's single omnibus test was meant to avoid
  • Ignoring unequal variances across groups: violating homoscedasticity distorts the F-test's Type I error rate โ€” check with Levene's test, or use Welch's ANOVA as a robust alternative
Three fertilizers

Three fertilizers applied to 5 plots each. Crop yields (kg): A: 23, B: 31, C: 23.

The grand mean is 24.6. Group means differ noticeably. ANOVA tests if these differences are larger than expected from random variation.

Try it

ANOVA uses the F-statistic F=MSbetween/MSwithinF = \text{MS}_{\text{between}} / \text{MS}_{\text{within}}. Why is F the right ratio to use? When would you expect F to be large vs. close to 1?

Solution

Under H0H_0, both MSbetween\text{MS}_{\text{between}} and MSwithin\text{MS}_{\text{within}} estimate the same population variance ฯƒ2\sigma^2, so their ratio is close to 1.

When H0H_0 is false (means differ), the between-group variance picks up the signal (differences between means), while the within-group variance still only estimates noise. So FF grows large. A large FF is evidence against H0H_0.

Related concepts