Lecture 03b – One-way ANOVA

ENVX2001 Applied Statistical Methods

Januar Harianto

Apr 2026

From \(t\)-tests to ANOVA

From two groups to four

Before the break we compared two cattle breeds. But what if there were four breeds instead of two?

Can we still use \(t\)-tests? Yes, but we would need to test every pair
With 4 groups that is \(\binom{4}{2} = 6\) comparisons: 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4

The problem: every test carries a risk of a Type I error (a false positive).

We set \(\alpha = 0.05\), so each comparison has a 5% chance of rejecting \(H_0\) when it is actually true
Six tests means six chances to get it wrong

The multiple comparisons problem

With 6 groups there are \(\binom{6}{2} = 15\) pairwise comparisons. The probability that none produce a false positive is \(0.95^{15}\), so:

\[P(\text{at least one false positive}) = 1 - 0.95^{15} = 53.7\%\]

Groups	Pairwise tests	P(at least one false positive)
2	1	5.0%
4	6	26.5%
6	15	53.7%
8	28	76.2%
10	45	90.1%

With 10 groups, a false positive is almost guaranteed
We need a method that tests all groups at once

Chick feeding

The experiment

We need a method that handles more than two groups. Consider a new experiment:

20 chicks are randomly assigned to one of 4 diets
5 replicates per diet
Response: weight gain (g)

chicks <- read.csv("data/chicks.csv") |>
  pivot_longer(
    starts_with("Diet"), names_to = "diet", values_to = "weight"
  ) |>
  mutate(diet = as.factor(diet))

Are there differences?

Code

ggplot(chicks, aes(diet, weight)) +
  geom_boxplot() +
  labs(x = "Diet", y = "Weight gain (g)") +
  cowplot::theme_cowplot()

What do the boxplots suggest?

Terminology

Term	Meaning	In this experiment
Factor	Categorical variable being tested	Diet
Levels	Categories within a factor	Diet 1, Diet 2, Diet 3, Diet 4
Replicates	Observations per level	\(r = 5\)

This is a one-way ANOVA because there is only one factor. What is ANOVA?

HATPC for Analysis of Variance

H: Hypotheses

In the \(t\)-test we asked whether two means were equal. ANOVA extends this to any number of groups.

Null hypothesis: \(H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\)
Alternative hypothesis: \(H_1:\) not all \(\mu_i\) are equal

The alternative does not say all means differ. It says at least two do.

Model equation:

\[y_{ij} = \mu_i + \varepsilon_{ij}\]

where \(i = 1, \ldots, t\) treatments and \(j = 1, \ldots, n_i\) replicates. The same structure as the \(t\)-test model, but \(i\) now ranges over more than two groups.

A: Assumptions

The same three assumptions from the \(t\)-test apply here. We fit the model first, then check.

model <- aov(weight ~ diet, data = chicks)

Normality (Shapiro-Wilk on residuals):

Code

shapiro.test(residuals(model))


    Shapiro-Wilk normality test

data:  residuals(model)
W = 0.90961, p-value = 0.06265

\(p > 0.05\): no evidence against normality.

Equal variances (Bartlett’s test):

Code

bartlett.test(weight ~ diet, data = chicks)


    Bartlett test of homogeneity of variances

data:  weight by diet
Bartlett's K-squared = 0.85164, df = 3, p-value = 0.8371

\(p > 0.05\): variances are similar across groups.

A: Residual diagnostics

Code

qqnorm(residuals(model), main = "Normal QQ plot")
qqline(residuals(model))
plot(fitted(model), residuals(model),
     xlab = "Fitted values", ylab = "Residuals",
     main = "Residuals vs Fitted")
abline(h = 0, lty = 2)

QQ plot: points follow the line, supporting normality
Residuals vs fitted: no fan shape, supporting equal variances
Independence holds by design: each chick is measured once

A: A shortcut for next time

R can produce all standard diagnostic plots at once:

plot(model)

We will explore these plots in detail next week. For now, the Shapiro-Wilk and Bartlett’s tests are sufficient.

T: The ANOVA idea

How does ANOVA decide whether these four diets differ?
If diets matter, differences between groups should be large relative to variation within groups

Code

overall_mean <- mean(chicks$weight)
group_means <- chicks |>
  group_by(diet) |>
  summarise(mean_wt = mean(weight))

p1 <- ggplot(chicks, aes(diet, weight)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_hline(data = group_means, aes(yintercept = mean_wt),
    linetype = "dashed", colour = "#e64626", linewidth = 0.8) +
  labs(title = "Four group means", x = "Diet", y = "Weight gain (g)") +
  cowplot::theme_cowplot()

p2 <- ggplot(chicks, aes(diet, weight)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_hline(yintercept = overall_mean, colour = "#e64626",
    linewidth = 0.8, linetype = "dashed") +
  labs(title = "One overall mean", x = "Diet", y = "Weight gain (g)") +
  cowplot::theme_cowplot()

library(patchwork)
p1 + p2

Do four separate means (left) explain the data better than a single overall mean (right)?

Two sources of variation

ANOVA splits the total variation in the data into two parts:

Between-group variation (orange): how far each group mean is from the overall mean. If diets matter, this should be large.
Within-group variation (black): how much individual observations scatter around their own group mean. This is random noise.

ANOVA compares these two quantities. If between-group variation is large relative to within-group variation, we have evidence that the groups genuinely differ.

Variance partitioning

import { variancePartitionWidget } from "../../assets/js/anova-widget.js"
variancePartitionWidget()

The F ratio

\[F = \frac{MS_{\text{trt}}}{MS_{\text{res}}}\]

\(F\) compares variation due to group differences against random noise
MS (mean square) is based on squared deviations: squaring prevents positive and negative differences from cancelling out
When \(H_0\) is true (all groups equal), \(F \approx 1\)
When groups genuinely differ, \(F \gg 1\)

For two groups, ANOVA gives the same answer as the \(t\)-test:

\[F = t^2\]

ANOVA extends the \(t\)-test to any number of groups.

T: The ANOVA table

Source	df	SS	MS	F
Treatment	\(t - 1\)	\(SS_{\text{trt}}\)	\(SS_{\text{trt}} / (t-1)\)	\(MS_{\text{trt}} / MS_{\text{res}}\)
Residual	\(N - t\)	\(SS_{\text{res}}\)	\(SS_{\text{res}} / (N-t)\)
Total	\(N - 1\)	\(SS_{\text{total}}\)

model <- aov(weight ~ diet, data = chicks)
summary(model)

            Df Sum Sq Mean Sq F value Pr(>F)   
diet         3  16467    5489   6.647  0.004 **
Residuals   16  13212     826                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P: Interpret

diet row: variation explained by the four diets (\(df = 3\), since 4 groups minus 1)
Residuals row: leftover variation (\(df = 16\), since 20 observations minus 4 groups)
\(F = 6.65\): the treatment variance is 6.65 times larger than the residual variance
\(p = 0.004\): below 0.05, so we reject \(H_0\)

How much variability is explained?

\[\frac{SS_{\text{treatment}}}{SS_{\text{total}}} = \frac{1.6467\times 10^{4}}{2.9679\times 10^{4}} = 55.5\%\]

The diets explain about 55% of the total variability in chick weight gain.

C: Conclusion

Source	df	SS	MS	F	p
Diet	3	1.6467^{4}	5489	6.65	0.004
Residual	16	1.3212^{4}	825.8

There are significant differences in mean weight gain among the four diets (\(F_{3,16} = 6.65\), \(p = 0.004\)).

But ANOVA only tells us that groups differ, not which ones. To identify specific differences, we need post-hoc analysis.

Post-hoc

Why post-hoc?

\(H_1\) says “not all means are equal,” but not which ones differ.

ANOVA tells us the four diet means are not all the same
It does not tell us whether Diet 1 differs from Diet 2, or Diet 3 from Diet 4
We cannot run separate \(t\)-tests (the multiple comparisons problem from the first slide)
We need a method that compares pairs while controlling the overall error rate

Post-hoc and confidence intervals

Before the break we used a confidence interval for the difference between two cattle breeds. Post-hoc methods apply the same logic to every pair of groups.

For each pair, construct a CI for the difference in means
If the CI excludes zero, those two groups differ significantly
If the CI includes zero, we cannot conclude they differ

The key difference from running separate \(t\)-tests: post-hoc methods widen the CIs to account for multiple comparisons, keeping the overall false positive rate at 5%.

We will cover specific post-hoc methods (Tukey, Bonferroni) in detail next week.

Estimated marginal means

Before the break we used CIs to compare two cattle breeds. The same idea extends to multiple groups. emmeans estimates the mean and standard error for each group from the model.

library(emmeans)
emm <- emmeans(model, "diet")
emm

 diet   emmean   SE df lower.CL upper.CL
 Diet.1   79.0 12.9 16     51.8    106.2
 Diet.2   71.0 12.9 16     43.8     98.2
 Diet.3   81.4 12.9 16     54.2    108.6
 Diet.4  142.8 12.9 16    115.6    170.0

Confidence level used: 0.95

Pairwise comparisons

To find which pairs differ, we construct a CI for the difference between every pair of means.

confint(pairs(emm))

 contrast        estimate   SE df lower.CL upper.CL
 Diet.1 - Diet.2      8.0 18.2 16    -44.0     60.0
 Diet.1 - Diet.3     -2.4 18.2 16    -54.4     49.6
 Diet.1 - Diet.4    -63.8 18.2 16   -115.8    -11.8
 Diet.2 - Diet.3    -10.4 18.2 16    -62.4     41.6
 Diet.2 - Diet.4    -71.8 18.2 16   -123.8    -19.8
 Diet.3 - Diet.4    -61.4 18.2 16   -113.4     -9.4

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates

Each row is one pairwise comparison (e.g. Diet1 - Diet2)
The lower.CL and upper.CL columns are the 95% CI for the difference
If the CI excludes zero, those two diets differ significantly

Visualising the comparisons

plot(emm, comparisons = TRUE) +
  labs(x = "Weight gain (g)", y = "Diet")

Each point is the estimated mean for that diet
The arrows represent comparison intervals (adjusted CIs)
Groups with overlapping arrows are not significantly different
We will cover post-hoc methods in more detail next week

Summary

Key points

ANOVA extends the \(t\)-test to more than two groups (\(F = t^2\) for two groups)
Multiple pairwise \(t\)-tests inflate the false positive rate. ANOVA tests all groups at once
The F ratio compares between-group variation to within-group variation
Post-hoc methods construct CIs for pairwise differences to identify which groups differ

In Lab 03, you will fit your own ANOVA with aov() and explore group comparisons with emmeans. Next week: residual diagnostics and post-hoc methods.

Thanks!

Questions?

This presentation is based on the SOLES Quarto reveal.js template and is licensed under a Creative Commons Attribution 4.0 International License.