Two-sample $t$-test

In the one-sample tests just considered, we compared a population mean, $\mu$ (where our experimental sample represents the population) with a fixed numeric value of interest. In practice, this is fairly rare. More frequently we have samples drawn from two (or more) populations and we compare the two means to see if the treatments produce similar results.

The null hypothesis is that the two samples come from a population with the same true mean, $\mu$. This is commonly expressed as

\[H_0: \mu_1 = \mu_2\ \text{or}\ H_0:\mu_1-\mu_2 =0\]

where $\mu_1$ and $\mu_2$ are the means of the two populations. The alternative hypothesis is that the two means are different, i.e.

\[H_1: \mu_1 \neq \mu_2\]

Assumptions

The two-sample $t$-test assumes that the data are :

Continuous,
at least approximately normally distributed, and
the variances of the two sets are homogeneous (i.e. the same).

Ideally these assumptions should be tested before you carry about the two-sample $t$-test.

What if assumptions are not met?

If the data do not meet either of these assumptions, then you may choose an appropriate transformation or look for another technique to use e.g. a non-parametric method (see non-parametric sections of this book).

If the variances cannot be assumed to be equal (but the data are normally distributed), there is a way of adjusting the two-sample $t$-test to compensate for this – it’s called the Satterthwaite’s approximation – more on this later!

Three Variants of the Two-Sample $t$-test

There are 3 different ways in which a two-sample $t$-test can be used. They ALL assume that the data is approximately normally distributed.

Independent samples, equal variance - run a two-sample $t$-test assuming equal variances.
Independent samples, unequal variance - run a two-sample $t$-test assuming unequal variances.
Paired samples - run a paired $t$-test.

In choosing which variant of the $t$-test is applicable in your situation, you first need to decide whether your two samples are paired or unpaired (independent).

Paired Data

Paired samples arise when we measure pairs of similar experimental units. In this pair of experimental units, one unit receives Treatment 1 and the other receives Treatment 2.

In some cases, treatments are applied to the same experimental unit e.g. one half of a piece of fruit receives Trt 1 and the other half Trt 2; two plants of different varieties are grown in the same pot (here variety is the treatment).

In recognising this pairing in our analysis we are taking into account the fact that biological variation between pairs is likely to be larger than within pairs. This way, we get a clearer picture of the difference that is due to the treatment factor.

Examples of paired data

Observations at two times on the same experimental unit
- Before and after readings of particle matter in the air on 3 sites near a new power station (before the station was built, and after it became operational). (Dytham 2003, 80)
- Measurements of water flow on two consecutive days at 6 sites along a river. (Dytham 2003, 83)
Observations on 2 halves/parts of the same experimental unit
- One half of each (uncut) grapefruit was exposed to sunlight, and the other half was shaded (McConway et al 1999, p. 198).
- A standard (recommended) variety of wheat is compared with a new variety via 2 similar plots on each of 8 farms (Clewer and Scarisbrick 2001, p. 46).

Two-Sample t-Test for Independent Samples

Procedure for the test:

Set up the null and alternate hypotheses
Decide on the level of significance, 5%, 1%, 0.01% etc.
Check the assumptions of normality and equal variance. We use an F-test to formally test for equality of variance.
Calculate the test statistic \[t = \frac{\bar{y}_1 - \bar{y}_2}{SED}\] where $\bar{x}_1$ and $\bar{x}_2$ are the sample means, and $SED$ is the standard error of the difference between the means.
Calculate the degrees of freedom (df).
Find the P-value in printed statistical tables or via GenStat or Excel.
Make a statistical conclusion by comparing this P-value to your chosen level of significance (if P <α, then reject null hypothesis).
Calculate the confidence interval.
Interpret your results biologically.

Calculating the test statistic

When calculating the test statistic, use $y_1$ as the larger mean. This will give a positive value for $t$.

Confidence Interval for $\mu_1 - \mu_2$ (Independent Samples)

A two-sample t-test shows whether there is evidence of a difference in population means. The magnitude of this difference can be estimated with a confidence interval.

A 95% confidence interval for the true difference $\mu_1 - \mu_2$ is given by $\bar{y}_1 - \bar{y}_2 \pm t^{\alpha/2}_{df} \times SED$

where $\bar{y}_1 - \bar{y}_2$ is the difference between the sample means, $t^{\alpha/2}_{df}$ is the critical value from the t-distribution for the chosen level of significance and degrees of freedom, and $SED$ is the standard error of the difference between the means.

The df and SED need to take into account whether or not you are assuming equal variances.

SED and df for Independent Samples with EQUAL Variances

\[ SED = \sqrt{s^2_p \frac{1}{n1}+ \frac {1}{n2}} \] and \[ s^2_p = \frac{(n_1- 1)s^2_1 + (n_2 -1)s^2_2}{n_1 + n_2 -2}\] $s^2_p$ is the pooled estimate of variance and $df = n_1 +n_2 -2$. We used a a pooled estimate in this estimate in this instance since we are assuming $\sigma^2_1 = \sigma^2_2$. $df = n_1 + n_2 -2$

SED and df for Independent Samples with UNequal Variances

What do you do if when you’re checking your assumptions for a two- sample t-test you find that the variances are not equal but the data is normally distributed? You can go ahead with a modified t-test or you can choose a different test.

This modified t-test used in the case of unequal variances is often called Satterthwaite’s approximate t-test.

Satterthwaite’s Approximate t-Test

The null and alternate hypotheses remain the same i.e. \[H_0: \mu1 = \mu2\] or \[H_0: \mu1 - \mu2 = 0\]

The formula for the test statistic, $t$, in Satterthwaite’s approximate test is a little different to that for the t-test with equal variances. The s.e.d. changes because we can no longer use a pooled estimate of the variance (since the variances cannot be assumed equal). \[ t = \frac {\bar{y_1 }- \bar{y}_2}{SED} \text {, where} \ SED = \sqrt{\frac{s^2_1}{n^1}+ \frac{s^2_2}{n_2}} \]

A correction for unequal variance is made to the degrees of freedom. \[ df = \frac{(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2})^2} {\frac{(\frac{s^2_1}{n_1})^2}{n_1 - 1} + \frac{(\frac{s^2_2}{n_2})^2}{n_2 - 1}} \]

The result of this equation for df is rounded to the nearest integer

Then proceed as usual through the rest of the test - i.e. find the P-value; draw a statistical conclusion about your hypothesis; interpret your results biologically.

F-Test for Equality of Variances

One of the conditions for the independent sample t-test to be valid is that the population variances $\sigma_{12}$ and $\sigma_{22}$ are equal.

To test the null hypothesis that $\sigma_{12}=\sigma_{22}$ divide the larger s^2 value by the smaller s^2 to obtain the variance ratio, v.r.: \[ v.r. = \frac{larger\ s^2}{smaller\ s^2} \] To undertake this two-tailed test at the 5% level you need to carry out the one-tailed test at the 2.5% level. You can use the 2.5% F table of critical values, or you can use GenStat via the menus Data>Probability Calculations… to find the P-value.

If you use the printed statistical table, you will need to compare the critical value you find there with the variance ratio you calculated. If the calculated variance ratio > critical value, reject H₀ and conclude that the variances are significantly different.

Example: Two-Sample t-Test

Weights of two breeds of cattle are to be compared: 15 cattle from Breed 1 and 12 cattle from Breed 2 were randomly sampled. Their recorded weights (kg) are shown.

Breed 1	Breed 2
148.1	187.6
146.2	180.3
152.8	198.6
135.3	190.7
151.2	196.3
146.3	203.8
163.5	190.2
146.6	201.0
162.4	194.7
140.2	221.1
159.4	186.7
181.8	203.1
165.1
165.0
141.6

The following descriptive statistics are obtained from these data.

	Breed 1	Breed 2
Sample mean (kg)	153.700	196.175
Sample s.d. (kg)	12.301	10.616

Is there any systematic difference in their weights?

The question, “Is there any systematic difference in their weights?”, is asking whether or not there is any significant difference between the 2 samples – and in effect if there is any significant difference between the 2 breeds because the factor that distinguishes the 2 samples is breed.

We could answer this question by testing whether or not the population mean weights for the 2 breeds can be assumed to be equal:

\[H_0: \mu_{breed\ 1} = \mu_{breed\ 2}\]versus
\[H_1: \mu_{breed\ 1} \neq \mu_{breed\ 2}\] To test this particular hypothesis about the equality of the means (as an avenue for answering our broader question), we would use a two-sample t-test.

However, it is REALLY important to remember that there are other tests and hypotheses that we could use to answer our broader question (about whether or not there is any statistical difference between the weights of the 2 breeds). We’ll consider some non-parametric alternatives in Biometry 2.

To proceed with a two-sample t-test, there are 2 assumptions that we need to check:

Normality of Breed 1 data and normality of Breed 2 data
Equality of variances of the 2 samples

Both of these checks are hypothesis tests in their own right and contain the usual elements of a hypothesis test i.e. null & alternate hypothesis; test statistic; df; P-value or critical value; conclusion.

Testing the Assumption of Normality

If you are doing the test by hand, you would need to assume that the data are normally distributed (and hope this is true). At least the data in this example is continuous… which is one small step towards normality.

Testing the Assumption of Equality of Variance \[H_0: \sigma^2~Breed1~ = \sigma^2~Breed2~\ \text{versus}\ H_0: \sigma^2~Breed1~ \neq \sigma^2~Breed2~ \]

Test statistic:

\[F_{Observed} = variance\ ratio \ (v.r.) = \frac {larger\ s^2}{smaller\ s^2} = \frac {12.301^2}{10.616^2} = \frac {151.315}{112.699} =1.34\] There are 2 degrees of freedom to calculate for an F test. They are called the numerator df, $\nu1$ and the denominator df, $\nu2$. (Remember from school days, numerator is the top half of a fraction, denominator is the bottom half of a fraction.) The $\nu$ that looks like a curly ‘v’ is the Greek letter ‘nu’.

For an F test used to test equality of variance, df are: $\nu1 = n1 – 1$, $\nu 2 = n2 – 1$.

Here, $\nu1 = n_{Breed 1} -1 = 15 - 1 = 14$ and $\nu2 = n_{Breed 2} - 1 = 12 – 1 = 11$

Using the 2.5% F table, we can compare (at the upper tail) the F_critical and F_observed values. If $Fobs > Fcrit$, we reject H₀ and conclude that the variances of the 2 samples are NOT equal.

From these tables, it is not possible to find exactly$F^{0.025}_{14,11}$, so we will make do with the closest possible value $F^{0.025}_{15,11} = 3.33$.

We find $F_{crit} \approx 3.33$ and $F_{obs} = 1.34$. Since $F_{crit} > F_{obs}$ ,we CAN assume that the variances are equal.

So… assuming normality and equal variances, we can complete a “pooled” two-sample t-test where \[ SED = \sqrt{s^2_p \frac{1}{n1}+ \frac {1}{n2}} \] and \[ s^2_p = \frac{(n_1- 1)s^2_1 + (n_2 -1)s^2_2}{n_1 + n_2 -2}\] $s^2_p$ is the pooled estimate of variance and $df = n_1 +n_2 -2$.

Testing the null hypothesis that H₀: $\mu_{Breed 1} = \mu_{Breed 2}$, we find that there is a significant difference in the mean weights of the two breeds of cattle (T = 9.46, df = 25, $P < 0.001$). The mean weight of Breed 2 is significantly higher and we are 95% confident that the mean weight for Breed 2 is between 33.2 and 51.7 kg higher than the mean weight for Breed 1.

This last piece of information, “we are 95% confident that the mean weight for Breed 2 is between 33.2 and 51.7 kg higher than the mean weight for Breed 1”, is obtained from the 95% confidence interval for the true difference ($\mu1 = \mu2$).

NB. Recall that a 95% confidence interval for the true difference ($\mu1 = \mu2$) is

\[ (\overline{y_1}- \overline{y_2})\pm t^{0.025}_{df} \times SED \]

Paired t-Test

We can re-write the generic null hypothesis for a two-sample test of means, $H_0: \mu1 = \mu2$ as… \[H_0: \mu1 – \mu2 = 0$ , or $H_0: \mu_d = 0\].

The paired t-test is actually a one-sample t-test of the differences between pairs of observations (from 2 different samples).

Assumption:

Data is approximately normally distributed.

Since we are doing a one-sample t-test on the differences then the assumption of equal variances is not relevant. Firstly, create a single column of data to use in the t test. Each value in the column corresponds to the difference between the 2 values for a particular matched pair.

Farm	Yield Variety A (kg)	Yield Variety B (kg)	Difference
1	17.8	14.7	3.1
2	18.5	15.2	3.3
3	12.2	12.9	-0.7
4	19.7	18.3	1.4
5	10.8	10.1	0.7
6	11.9	12.2	-0.3
7	15.6	13.5	2.1
8	12.5	9.9	2.6

The difference for Farm 1 = 17.8 – 14.7 = 3.1.

Data Source: Clewer and Scarisbrick (2001, 46)

Secondly, find some summary statistics of the differences so you can complete the t test:

No. of values, n = 8        Sum = 12.2          Mean = 1.525
Variance = 2.299            Std Dev = 1.516

Test statsic, \[ t = \frac {\overline{y}- \mu_d}{\sqrt{\frac{s^2_d}{n_d}}} \] and df = n -1, where $\mu_d$ is usually 0.

Confidence Interval for $\mu_1 – \mu_2$ (Paired Samples)

A 95% confidence interval for the mean difference is $\overline{y}_d \pm t^0.025_{df} \times s.e$

Two-sample \(t\)-test

Assumptions

What if assumptions are not met?

Three Variants of the Two-Sample \(t\)-test

Paired Data

Examples of paired data

Two-Sample t-Test for Independent Samples

Confidence Interval for \(\mu_1 - \mu_2\) (Independent Samples)

SED and df for Independent Samples with EQUAL Variances

SED and df for Independent Samples with UNequal Variances

Satterthwaite’s Approximate t-Test

F-Test for Equality of Variances

Example: Two-Sample t-Test

Paired t-Test

Confidence Interval for \(\mu_1 – \mu_2\) (Paired Samples)