ENVX2001 Applied Statistical Methods
Apr 2026
The farmer decides to use stratified random sampling instead, dividing their property by land type.
With two distinct land types, how do we guarantee both are represented in our sample?
If studying plant biodiversity in a national park:
Stratified random sampling addresses three problems at once:
No. With large enough samples, the two methods converge. Simple random sampling is still a good default when strata are unknown or the population is fairly homogeneous.
Once we have our stratified sample, the calculations follow the same logic as simple random sampling, but each step must account for the stratified design.
The farmer collects the same 7 measurements as before. This time, each value is assigned to its land type:
The pooled mean is our best estimate of the overall population mean, taking into account the different stratum sizes.
\[\bar{x}_{s} = \sum_{i=1}^L \bar{x}_i \times w_i\]
We calculate the mean for each stratum (\(\bar{x}_i\)), multiply by its weight (\(w_i\)), and add them together.
We first define the weights \(w_i\) for each stratum based on their area:
\[SE(\bar x_{s}) = \sqrt{\color{blue}{{\sum_{i=1}^L w_i^2}} \times \frac{s_i^2}{n_i}}\]
What is different from simple random sampling?
\[df = n - L\]
where \(n\) is the total number of samples and \(L\) is the number of strata.
varA <- var(landA) / length(landA) # variance of the mean for A
varB <- var(landB) / length(landB) # variance of the mean for B
weighted_var <- weight[1]^2 * varA + weight[2]^2 * varB
weighted_se <- sqrt(weighted_var)
ci <- c(
L95 = weighted_mean - t_crit * weighted_se,
u95 = weighted_mean + t_crit * weighted_se
)
ci L95 u95
61.04864 76.68803
The farmer sampled the same 7 locations using a stratified design. How do the results compare to simple random sampling?
| Design | Mean | Var (mean) | L95 | U95 | df |
|---|---|---|---|---|---|
| Simple Random | 67.29 | 50.80 | 49.85 | 84.73 | 6 |
| Stratified Random | 68.87 | 9.25 | 61.05 | 76.69 | 5 |
\[\text{Efficiency} = \frac{\text{Variance of SRS}}{\text{Variance of Stratified}}\]
How many SRS samples would we need for the same precision?
About 38 samples with SRS to match what 7 stratified samples achieved.
The farmer now has a solid baseline. They know their soil carbon is around 69 t/ha with a tight confidence interval. A year passes. They introduce cover cropping and want to know: has soil carbon changed?
This is the second type of observational study we met in the first half, a monitoring study. Instead of estimating a single value, we are estimating change.
To answer this question, the farmer measures the same property again.
The difference between the means of the two sets of measurements.
\[\Delta \bar x = \bar x_2 - \bar x_1\]
where \(\bar x_2\) and \(\bar x_1\) are the means of the second and first set of measurements, respectively.
R handles both approaches through t.test():
t.test(after, before, paired = TRUE)t.test(after, before, paired = FALSE)R connection
t.test() with paired = TRUE performs a paired t-test. You will use this in Lab 02 to analyse monitoring data.
Today we followed one farmer through four problems, and each problem needed a new statistical tool:
Each concept built on the last. The same logic (estimate, quantify uncertainty, improve) runs through every sampling study you will encounter in this course.
This presentation is based on the SOLES Quarto reveal.js template and is licensed under a Creative Commons Attribution 4.0 International License.