Inferences for mean/proportion
Letâs recap what a confidence interval is,
A confidence interval of level $100(1 - \alpha)%$ means that we are $100(1 - \alpha)%$ confident that the true value of the parameter is included into the interval.
When dealing with confidence intervals, we will often encounter different âtypesâ of situations. Letâs review these.
Confidence Interval on the mean of a Normal Distribution, variance known
Suppose we have a normal distribution with unknown mean $\mu$ and known variance $\sigma^2$.
We have thus a random sample $X_1, X_2, \ldots, X_n$, such that, for all $i$, $$ X_i \sim N(\mu, \sigma^2) $$
with $\mu$ unknown and $\sigma$ a known constant.
We would like a confidence interval for $\mu$.
We know that, $$ \bar{X} = \frac{1}{n} \sum_{i = 1}^n X_i \sim N\left(\mu, \frac{\sigma^2}{n}\right) $$
we can standardize this to, $$ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} = \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \sim N(0, 1) $$
So, because $Z \sim N(0, 1)$, $$ P(-z_{1 - \alpha/2} \leq Z \leq z_{1 - \alpha/2}) = 1 - \alpha $$
Thus, $$ P\left(-z_{1 - \alpha/2} \leq \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \leq z_{1 - \alpha/2}\right) = 1 - \alpha $$
Re-arranging this, we get, $$ P\left(\bar{X} - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{X} + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right) = 1 - \alpha $$
Letâs call them $L$ and $U$ for lower and upper, finally, the interval is, $$ [L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right] $$
Confidence Interval on the mean of an arbitrary distribution, variance known
Let us recap the central limit theorem.
The Central Limit (CLT) implies if $n$ is large enough, $$ Z = \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \sim N(0, 1) $$
Thus, $$ P\left(-z_{1 - \alpha/2} \leq \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \leq z_{1 - \alpha/2}\right) \simeq 1 - \alpha $$
Which yields the (same) interval, $$ [L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right] $$
Confidence interval on the mean of a normal distribution, variance unknown
In the case we do not have the variance, we need to sample it!
Recall the sample variance, $$ S^2 = \frac{1}{n - 1} \sum_{i = 1}^n (X_i - \bar{X})^2 $$
A natural procedure is thus to, $$ T = \sqrt{n} \frac{\bar{X} - \mu}{S} $$
In a normal population, the exact distribution of $T$ is $T \sim t_{n - 1}$.
We can write, $$ P(-t_{n - 1;1 - \alpha/2} \leq \sqrt{n} \frac{\bar{X} - \mu}{S} \leq t_{n - 1;1 - \alpha/2}) = 1 - \alpha $$
or, $$ P\left(\bar{X} - t_{n - 1;1 - \alpha/2} \frac{S}{\sqrt{n}} \leq \mu \leq \bar{X} + t_{n - 1;1 - \alpha/2} \frac{S}{\sqrt{n}}\right) = 1 - \alpha $$
Which yields us the interval, $$ [L, U] = \left[\bar{x} - t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}\right] $$
In large samples, $$ T \simeq N(0, 1) $$
Consequently, an approximate confidence interval of level $100(1 - \alpha)%$ for $\mu$ is, $$ [L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{s}{\sqrt{n}}\right] $$
Confidence Interval on the mean: Summary
Is the population normal?
- If yes, is $\sigma$ known?
- If yes, use an **exact $z$-confidence interval:
- $[L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right]$
- If no, use an **exact $t$-confidence interval:
- $[L, U] = \left[\bar{x} - t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}\right]$
- If yes, use an **exact $z$-confidence interval:
- If no, use an approximate large sample confidence interval:
- $[L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right]$
- or
- $[L, U] = \left[\bar{x} - z_{1 - \alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + z_{1 - \alpha/2} \frac{s}{\sqrt{n}}\right]$
Confidence Interval on the proportion
The random variable to study is, $$ X = \begin{cases} 1 & \text{If the individual has the characteristic of interest} \newline 0 & \text{If not} \end{cases} $$
$X_1, X_2, \ldots, X_n$ is a set of $n$ independent Bern($p$) random variables.
Thus, $$ Y = \sum_{i = 1}^n X_i \sim B(n, p) $$
and the sample proportion is, $$ \hat{P} = \frac{Y}{n} $$
We also know that, $$ \sqrt{n} \frac{\hat{P} - p}{\sqrt{p(1 - p)}} \sim N(0, 1) $$
if $n$ is large.
Approximate two-sided confidence interval of level $100(1 - \alpha)%$ for $p$ is given by, $$ \left[\hat{p} - z_{1 - \alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}, \hat{p} + z_{1 - \alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}\right] $$
Hypotheses testing for the mean
Letâs recall hypothesis testing. The null hypothesis is usually of the form, $$ H_0 : \mu = \mu_0 $$
The alternative hypothesis can be a two-sided alternative, $$ H_a : \mu \neq \mu_0 $$
or one-sided alternatives, $$ H_a : \mu > \mu_0 \quad \text{or} \quad H_a : \mu < \mu_0 $$
Remember, we have two important cases
- Rejecting $H_0$ when it is true: type I error.
- Failing to reject H_0 when it is false: type II error.
$$ P(\text{Type I error}) = P(\text{reject } H_0 | H_0 \text{ is true }) = \alpha \newline P(\text{Type II error}) = P(\text{fail to reject } H_0 | H_0 \text{ is false }) = \beta $$
Note that $\beta$ depends on the (unknown) value of $\mu$ under the alternative.
Assume for the moment that the population is normal with known $\sigma$. $$ \bar{X} \sim N(\mu, \frac{\sigma^2}{n}) $$
At significance level $\alpha$, we are after two constants $\ell$ and $u$ such that, $$ \alpha = P(\bar{X} \notin [\ell, u] \text{ when } \mu = \mu_0) = P\left(Z \notin \left[\sqrt{n} \frac{\ell - \mu_0}{\sigma}, \sqrt{n} \frac{u - \mu_0}{\sigma}\right]\right) $$
Thus, $$ \sqrt{n} \frac{\ell - \mu_0}{\sigma} = z_{\alpha/2} = -z_{1 - \alpha/2} $$
and, $$ \sqrt{n} \frac{u - \mu_0}{\sigma} = z_{1 - \alpha/2} $$
This yields, $$ \ell = \mu_0 - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}} \newline u = \mu_0 + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}. $$
The decision rule is then, $$ \text{Reject } H_0 \text{ if } \bar{x} \notin [\mu_0 - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \mu_0 + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}] $$
Hypotheses testing for the mean: $p$-value
The $p$-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value when $H_0$ is true. (extreme is to be understood in the direction of the alternative).
When testing $H_0 : \mu = \mu_0$ against $H_a : \mu \neq \mu_0$, the $p$-value will be the probability of finding the random variable $\bar{X}$ more different to $\mu_0$ than the observed $\bar{x}$, that is, $$ \begin{align*} p & = P(\bar{X} \notin [\mu_0 \pm |\bar{x} - \mu_0|] \text{ when } \mu = \mu_0) \newline & = 1 - P(\bar{X} \in [\mu_0 \pm |\bar{x} - \mu_0|] \text{ when } \mu = \mu_0) \end{align*} $$
Let us define, $$ z_0 = \sqrt{n} \frac{\bar{x} - \mu_0}{\sigma} $$
as the âobserved value of the test statisticâ.
As we know that $Z = \sqrt{n} \frac{\bar{X} - \mu_0}{\sigma} \sim N(0, 1)$, we can write, $$ \begin{align*} p & = 1 - P(\sqrt{n} \frac{\bar{X} - \mu_0}{\sigma} \in \left[\sqrt{n} \frac{\mu_0 \pm |\bar{x} - \mu_0| - \mu_0}{\sigma}\right]) \newline & = 1 - P(Z \in [-|z_0|, |z_0|]) = 2(1 - \phi(|z_0|)) \end{align*} $$
Operationally, since a $p$-value is computed, we typically compare it to a predefined significance level $\alpha$ to make a decision: $$ \begin{cases} \text{if } p < \alpha, \text{ reject } H_0 \newline \text{if } p \geq \alpha, \text{ do not reject } H_0 \end{cases} $$
Hypotheses testing for the mean: one-sided
With $H_a : \mu > \mu_0$ we are after a constant $u$ such that, $$ P(\bar{X} > u \text{ when } \mu = \mu_0) = \alpha $$
As we know that $Z = \sqrt{n} \frac{\bar{X} - \mu_0}{\sigma} \sim N(0, 1)$, the decision rule is reject $H_0$ if $\bar{x} > \mu_0 + z_{1 - \alpha} \frac{\sigma}{\sqrt{n}}$.
Again with $z_0 = \sqrt{n} \frac{\bar{x} - \mu_0}{\sigma}$, the $p$-value is, $$ p = P(\bar{X} > \bar{x} \text{ when } \mu = \mu_0) = P(Z > \sqrt{n} \frac{\bar{x} - \mu_0}{\sigma}) = 1 - \phi(z_0) $$
With $H_a : \mu < \mu_0$, we are after a constant $\ell$ such that, $$ P(\bar{X} < \ell \text{ when } \mu = \mu_0) = \alpha $$
The decision rule is reject $H_0$ if $\bar{x} < \mu_0 - z_{1 - \alpha} \frac{\sigma}{\sqrt{n}}$.
The $p$-value is, $$ p = P(\bar{X} < \bar{x} \text{ when } \mu = \mu_0) = P(Z < \sqrt{n} \frac{\bar{x} - \mu_0}{\sigma}) = \phi(z_0) $$
Hypotheses testing for the mean: other cases
Say we have a normal population with unknown standard deviation.
Specifically, for the two-sided test $H_0 : \mu = \mu_0$ against $H_a : \mu \neq \mu_0$, the decision rule is, $$ \text{reject } H_0 \text{ if } \bar{x} \notin [\mu_0 - t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}, \mu_0 + t_{n - 1;1 - \alpha/2} \frac{s}{\sqrt{n}}] $$
and from the observed value of the test statistic, $$ t_0 = \sqrt{n} \frac{\bar{x} - \mu_0}{s} $$
we can compute the $p$-value, $$ p = 1 - P(T \in [-|t_0|, |t_0|]) = 2P(T > |t_0|) \text{ where } T \sim t_{n - 1} $$
If we have non-normal populations with known or unknown standard deviation, $$ \text{reject } H_0 \text{ if } \bar{x} \notin \left[\mu_0 - z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}, \mu_0 + z_{1 - \alpha/2} \frac{\sigma}{\sqrt{n}}\right] $$
or, $$ \text{reject } H_0 \text{ if } \bar{x} \notin \left[\mu_0 - z_{1 - \alpha/2} \frac{s}{\sqrt{n}}, \mu_0 + z_{1 - \alpha/2} \frac{s}{\sqrt{n}}\right] $$
The associated approximate $p$-value will be given by, $$ p = 2(1 - \phi(|z_0|)), $$
with $z_0 = \sqrt{n} \frac{\bar{x} - \mu_0}{\sigma}$ or $z_0 = \sqrt{n} \frac{\bar{x} - \mu_0}{s}$.
Hypotheses testing for the proportion
Large-sample confidence interval for $p$, $$ \left[\hat{p} - z_{1 - \alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}, \hat{p} + z_{1 - \alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}\right] $$
The decision rule at (approximate) level $\alpha$ is, $$ \text{reject } H_0 \text{ if } \hat{p} \notin \left[p_0 - z_{1 - \alpha/2} \sqrt{\frac{p_0(1 - p_0)}{n}}, p_0 + z_{1 - \alpha/2} \sqrt{\frac{p_0(1 - p_0)}{n}}\right] $$
The (approximate) $p$-value for this test is, $$ p = 2(1 - \phi(|z_0|)) $$
where $z_0 = \sqrt{n} \frac{\hat{p} - p_0}{\sqrt{p_0(1 - p_0)}}$.
R code for finding $z_{1 - \alpha/2}$
alpha<-0.05
zstar<-qnorm(1-alpha/2)
Example
Suppose 53 people among 100 surveyed is for the proposition, find the 95% confidence interval for the proportion of people in favor of the proposition.
n<-100
phat<-53/n
SE<-sqrt(phat*(1-phat)/n)
alpha<-0.05
zstar<-qnorm(1 - alpha/2)
> c(phat-zstar*SE, phat+zstar*SE)
[1] 0.432178 0.6278216
Built-in functions in R for finding the CI of proportion estimate
prop.test(x=53, n=100, conf.level=0.95)
> 1-sample proportions test with continuity correction
>
> data: 53 out of 100, null probability 0.5
> X-squared = 0.25, df = 1, p-value = 0.6171
> alternative hypothesis: true p is not equal to 0.5
> 95 percent confidence interval:
> 0.4280225 0.6296465
> sample estimates:
> p
> 0.53
Built-in functions in R for finding the CI of mean estimate
x<-c(175, 185, 170, 184, 175)
t.test(x, conf.level=0.90)
> One Sample t-test
>
> data: x
> t = 61.567, df = 4, p-value = 4.169e-07
> alternative hypothesis: true mean is not equal to 0
> 90 percent confidence interval:
> 171.6434 183.9566
> sample estimates:
> mean of x
> 177.8
t.test(x, conf.level=0.90, alt="less")
> One Sample t-test
>
> data: x
> t = 61.567, df = 4, p-value = 1
> alternative hypothesis: true mean is less than 0
> 90 percent confidence interval:
> -Inf 182.2278
> sample estimates:
> mean of x
> 177.8
Inferences for difference of means
In many situations it is quite common to be interested in *comparing two âpopulationsâ in regard to a parameter of interest.
The two âpopulationsâ may be:
- Produced items using an existing and a new technique.
- Success rates in two groups of individuals.
- Health test results for patients who received a drug and for patients who received a placebo.
Two-sample test
$X_{11}, X_{12}, \ldots, X_{1n_1}$ is a sample from population 1. $X_{21}, X_{22}, \ldots, X_{2n_2}$ is a sample from population 2.
The samples are independent (i.e., observations in sample 1 are by no means linked to the observations in sample 2, they concern different individuals).
What we would like to know is whether $\mu_1 = \mu_2$ or not.
So, $$ H_0 : \mu_1 = \mu_2 $$
We compute the sample means $\bar{x}_1$ and $\bar{x}_2$.
- If $\bar{x}_1 \simeq \bar{x}_2$, then H_0 is probably acceptable.
- If $\bar{x}_1$ is considerably different to $\bar{x}_2$, that is evidence that $H_0$ is not true and we are tempted to reject it.
Note that the alternative hypothesis can be, $$ H_1 : \mu_1 \neq \mu_2 \ | \ \text{two-sided alternative} $$
or, $$ H_1 : \mu_1 > \mu_2 \quad \text{or} \quad H_1 : \mu_1 < \mu_2 \ | \ \text{one-sided alternative} $$
We know that, $$ \bar{X_1} = \frac{1}{n_1} \sum_{i = 1}^{n_1} X_{1i} \sim N\left(\mu_1, \frac{\sigma_1^2}{n_1}\right) $$
and, $$ \bar{X_2} = \frac{1}{n_2} \sum_{i = 1}^{n_2} X_{2i} \sim N\left(\mu_2, \frac{\sigma_2^2}{n_2}\right) $$
we deduce the sampling distribution of $\bar{X_1} - \bar{X_2}$, $$ \bar{X_1} - \bar{X_2} \sim N\left(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}\right) $$
Now, testing for $H_0 : \mu_1 = \mu_2$ exactly amounts to for $H_0 : \mu_1 - \mu_2 = 0$, with $\bar{X_1} - \bar{X_2}$ as an estimator for $\mu_1 - \mu_2$.
Two-sample test: known variances
Suppose that $\sigma_1$ and $\sigma_2$ are known.
For the two-sided test (with $H_a : \mu_1 - \mu_2 \neq 0$), at significance level $\alpha$, the decision rule is, $$ \text{Reject } H_0 \text{ if } \bar{x_1} - \bar{x_2} \notin \left[-z_{1 - \alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, z_{1 - \alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\right] $$
$p$-value is, $$ p = 2(1 - \phi(|z_0|)), $$
where $z_0 = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$.
Similarly, for the one-sided test $H_A : \mu_1 > \mu_2$, the decision rule is, $$ \text{reject } H_0 \text{ if } \bar{x_1} - \bar{x_2} > z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$
and the $p$-value is, $$ p = 1 - \phi(z_0) $$
while for the one-sided test $H_A : \mu_1 < \mu_2$, the decision rule is, $$ \text{reject } H_0 \text{ if } \bar{x_1} - \bar{x_2} < -z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$
and the $p$-value is, $$ p = \phi(z_0) $$
$100(1 - \alpha)%$ two-sided confidence interval for $\mu_1 - \mu_2$ is, $$ \left[(\bar{x_1} - \bar{x_2}) - z_{1 - \alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\right] $$
$100(1 - \alpha)%$ one-sided confidence intervals for $\mu_1 - \mu_2$ are, $$ \left(-\infty, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\right] $$
and, $$ \left[(\bar{x_1} - \bar{x_2}) - z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, +\infty\right] $$
Two-sample test: unknown equal variances
Assume now $\sigma_1 = \sigma_2 = \sigma$, but $\sigma$ is unknown.
We can estimate $\sigma^2$ by the pooled variance estimator, $$ S_{p}^2 = \frac{\sum_{i = 1}^{n_1} \left(X_{1i} - \bar{X_1}\right)^2 + \sum_{i = 1}^{n_2} \left(X_{2i} - \bar{X_2}\right)^2}{n_1 + n_2 - 2} $$
Where $S_1^2$ and $S_2^2$ are the sample variances of the two samples, $$ S_1^2 = \frac{1}{n_1 - 1} \sum_{i = 1}^{n_1} (X_{1i} - \bar{X_1})^2 $$
and, $$ S_2^2 = \frac{1}{n_2 - 1} \sum_{i = 1}^{n_2} (X_{2i} - \bar{X_2})^2 $$
For the two-sided test (with $H_a : \mu_1 - \mu_2 \neq 0$), at significance level $\alpha$, at significance level $\alpha$, the decision rule is, $$ \text{reject } H_0 \text{ if } \bar{x_1} - \bar{x_2} \notin \left[-t_{n_1 + n_2 - 2;1 - \alpha/2} s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, t_{n_1 + n_2 - 2;1 - \alpha/2} s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\right] $$
The $p$-value is given by, $$ p = 2P(T > |t_0|) $$
with $T \sim t_{n_1 + n_2 - 2}$, and where $t_0$ is the observed value of the test statistic, $$ t_0 = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} $$
This test is known as the two-sample $t$-test.
A $100(1 - \alpha)%$ two-sided confidence interval for $\mu_1 - \mu_2$ is, $$ \left[(\bar{x_1} - \bar{x_2}) - t_{n_1 + n_2 - 2;1 - \alpha/2} s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}, (\bar{x_1} - \bar{x_2}) + t_{n_1 + n_2 - 2;1 - \alpha/2} s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\right] $$
Two-sample test: unknown unequal variances
There is no exact result available. An approximate result can be applied, $$ \frac{(\bar{X_1} - \bar{X_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \sim t_{\nu} $$
where the number of degrees of freedom is, $$ \nu = \frac{\left(\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}\right)^2}{\frac{\left(\frac{S_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{S_2^2}{n_2}\right)^2}{n_2 - 1}} $$
(rounded down to the nearest integer).
This is called Welchâs two sample $t$-test.
Example
6 subjects were given a drug (treatment group, $\mu_2$) and an additional 6 subjects a placebo (control group, $\mu_1$). Their reaction time to a stimulus was measured (in ms). Perform a two sample $t$-test for comparing the means of the treatment and control groups
Letâs use a one-sided test. $$ H_0 : \mu_1 = \mu_2 \quad \text{vs} \quad H_1 : \mu_1 < \mu_2 $$
control<-c(91, 87, 99, 77, 88, 91)
treat<-c(101, 110, 103, 93, 99, 104)
t.test(control, treat, alternative="less", var.equal=TRUE)
> Two Sample t-test
>
> data: control and treat
> t = -3.4456, df = 10, p-value = 0.003136
> alternative hypothesis: true difference in means is less than 0
> 95 percent confidence interval:
> -Inf -6.082744
> sample estimates:
> mean of x mean of y
> 88.83333 101.66667
We can also do a Welch test,
t.test(control, treat, alternative="less")
> Welch Two Sample t-test
>
> data: control and treat
> t = -3.4456, df = 9.4797, p-value = 0.003391
> alternative hypothesis: true difference in means is less than 0
> 95 percent confidence interval:
> -Inf -6.044949
> sample estimates:
> mean of x mean of y
> 88.83333 101.66667
Paired data for difference of means
Two sample $t$-test cannot be used when we deal with âbefore and afterâ data, and numerous situations where the data are naturally paired (and thus, not independent).
Let $(X_{11}, X_{21}), (X_{12}, X_{22}), \ldots, (X_{1n}, X_{2n})$ be a random sample of $n$ pairs of observations drawn from two subpopulations $X_1$ and $X_2$, with respective means $\mu_1$ and $\mu_2$.
An easy way is just to consider the diffrences. $$ D_i = X_{i1} - X_{i2} $$
We have just a sample $D_1, D_2, \ldots, D_n$ from a distribution with mean, $$ \mu_D = \mu_1 - \mu_2 $$
Testing for $H_0 : \mu_1 = \mu_2$ is just $H_0 : \mu_D = 0$. This can be accomplished by performing the usual one sample test for mean.
So, in R we can,
t.test(x,y, paired=TRUE)
# or
t.test(x-y)
Inferences for variance
We know that, $$ S^2 = \frac{1}{n - 1} \sum_{i = 1}^n (X_i - \bar{X})^2 $$
and is a natural estimator for the population variance $\sigma^2$.
In general, little can be said about the distribution of $S^2$. However, when the population is normal, that is $X \sim N(\mu, \sigma^2)$, then, $$ \frac{(n - 1)S^2}{\sigma^2} \sim \chi^2_{n - 1} $$
Let $\chi^2_{\nu; \alpha}$ be the value such that, $$ P(X > \chi^2_{\nu; \alpha}) = \alpha $$
for $X \sim \chi^2_{\nu}$.
As we know that, $$ \frac{(n - 1)S^2}{\sigma^2} \sim \chi^2_{n - 1} $$
we can write, $$ P(\chi^2_{n - 1;1 - \alpha/2} \leq \frac{(n - 1)S^2}{\sigma^2} \leq \chi^2_{n - 1;\alpha/2}) = 1 - \alpha $$
which can be re-arranged to, $$ P\left(\frac{(n - 1)S^2}{\chi^2_{n - 1;\alpha/2}} \leq \sigma^2 \leq \frac{(n - 1)S^2}{\chi^2_{n - 1;1 - \alpha/2}}\right) = 1 - \alpha $$
Which gives us the interval, $$ \left[\frac{(n - 1)S^2}{\chi^2_{n - 1;\alpha/2}}, \frac{(n - 1)S^2}{\chi^2_{n - 1;1 - \alpha/2}}\right] $$
Now, $H_0 : \sigma^2 = \sigma_0^2$ against $H_a : \sigma^2 \neq \sigma_0^2$. It is natural to reject $H_0$ whenever $s^2$ is too distant from $\sigma_0^2$.
We are after two constants $\ell$ and $u$ such that, $$ \alpha = P(S^2 \notin [\ell, u] \text{ when } \sigma^2 = \sigma_0^2) = P\left(\frac{(n - 1)S^2}{\sigma_0^2} \notin \left[\frac{(n - 1)S^2}{u}, \frac{(n - 1)S^2}{\ell}\right]\right) $$
This yields, $$ \ell = \frac{\chi^2_{n - 1;\alpha/2} \sigma_0^2}{n - 1} \newline u = \frac{\chi^2_{n - 1;1 - \alpha/2} \sigma_0^2}{n - 1} $$
The decision rule is then, $$ \text{Reject } H_0 \text{ if } s^2 \notin \left[\frac{\chi^2_{n - 1;\alpha/2} \sigma_0^2}{n - 1}, \frac{\chi^2_{n - 1;1 - \alpha/2} \sigma_0^2}{n - 1}\right] $$
One-sided CI, $$ \left[0, \frac{(n - 1)S^2}{\chi^2_{n - 1;\alpha}}\right] \newline \left[\frac{(n - 1)S^2}{\chi^2_{n - 1;1 - \alpha}}, +\infty\right] $$
One-sided test, $$ \text{For } H_a : \sigma^2 > \sigma_0^2, \text{ reject } H_0 \text{ if } s^2 > \frac{\chi^2_{n - 1;\alpha} \sigma_0^2}{n - 1} \newline \text{For } H_a : \sigma^2 < \sigma_0^2, \text{ reject } H_0 \text{ if } s^2 < \frac{\chi^2_{n - 1;1 - \alpha} \sigma_0^2}{n - 1} $$
The $p$-value is, $$ P(S^2 > s^2) = 1 - P\left(\frac{(n - 1) S^2}{\sigma^2_0} \leq \frac{(n - 1) s^2}{\sigma^2_0}\right) = 1 - P\left(\chi^2_{n - 1} \leq \frac{(n - 1) s^2}{\sigma^2_0}\right) \newline P(S^2 < s^2 = P\left(\frac{(n - 1) S^2}{\sigma^2_0} \leq \frac{(n - 1) s^2}{\sigma^2_0}\right) = P\left(\chi^2_{n - 1} \leq \frac{(n - 1) s^2}{\sigma^2_0}\right) $$
Inferences for ratio of variances/test of equality of variances
Let $X_{11}, X_{12}, \ldots, X_{1n}$ is a sample from population 1. Let $X_{21}, X_{22}, \ldots, X_{2n}$ is a sample from population 2.
The samples are independent, and we would like to whether $\sigma_1^2 = \sigma_2^2$ or not.
Define the sample variances, $$ S_1^2 = \frac{1}{n_1 - 1} \sum_{i = 1}^{n_1} (X_{1i} - \bar{X_1})^2 \newline S_2^2 = \frac{1}{n_2 - 1} \sum_{i = 1}^{n_2} (X_{2i} - \bar{X_2})^2 $$
We can use the test statistic, $$ F = \frac{S_1^2}{S_2^2} $$
In general, there is no known exact distribution for F. Fortunately, if $X_{1i} \sim N(\mu_1, \sigma_1^2)$ and $X_{2i} \sim N(\mu_2, \sigma_2^2)$, F has an F(n_1 - 1, n_2 - 1) distribution if the null ($\sigma_1^2 = \sigma_2^2$) is true.
$H_0 : \sigma_1^2 = \sigma_2^2$ against $H_a : \sigma_1^2 \neq \sigma_2^2$.
It is natural to reject $H_0$ whenever $F$ is too big or too small,
The decision rule is, $$ \text{Reject } H_0 \text{ if } F \notin \left[\frac{1}{F_{n_1 - 1, n_2 - 1;1 - \alpha/2}}, F_{n_1 - 1, n_2 - 1;\alpha/2}\right] $$
Two-sided CI for $\frac{\sigma_1^2}{\sigma_2^2}$, $$ P(F_{n_1 - 1, n_2 - 1;1 - \alpha/2} \leq \frac{\sigma_1^2}{\sigma_2^2} \leq F_{n_1 - 1, n_2 - 1;\alpha/2}) = 1 - \alpha $$
Which yields us the interval, $$ \left[\frac{S_1^2}{S_2^2 F_{n_1 - 1, n_2 - 1;\alpha/2}}, \frac{S_1^2}{S_2^2 F_{n_1 - 1, n_2 - 1;1 - \alpha/2}}\right] $$
One-sided CI, $$ \left[0, \frac{S_1^2}{S_2^2 F_{n_1 - 1, n_2 - 1;\alpha}}\right] \newline \left[\frac{S_1^2}{S_2^2 F_{n_1 - 1, n_2 - 1;1 - \alpha}}, +\infty\right] $$
One-sided test, $$ \text{For } H_a : \sigma_1^2 > \sigma_2^2, \text{ reject } H_0 \text{ if } F > F_{n_1 - 1, n_2 - 1;1 - \alpha} \newline \text{For } H_a : \sigma_1^2 < \sigma_2^2, \text{ reject } H_0 \text{ if } F < F_{n_1 - 1, n_2 - 1;\alpha} $$
The $p$-value is, $$ P(F > f) = 1 - P(F(n_1 - 1, n_2 - 1) \leq f) \newline P(F < f) = P(F(n_1 - 1, n_2 - 1) \leq f) $$