$next$ $up$ $previous$

Statistical Inference:
Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one kind of inference called estimation. In this chapter, we study a second kind of inference called hypothesis testing.
The validity of inference is related to the way the data are obtained, and to the stationarity of the process producing the data.
The Components of a Statistical Hypothesis Testing Problem

1.
The Scientific Hypothesis
2.
The Statistical Model
3.
The Statistical Hypotheses
4.
The Test Statistic
5.
The P-Value
Example:
One stage of a manufacturing process involves a manually-controlled grinding operation. Management suspects that the grinding machine operators tend to grind parts slightly larger rather than slightly smaller than the target diameter of 0.75 inches while still staying within specification limits, which are 0.75 $\pm$ 0.01 inches. To verify their suspicions, they sample 150 within-spec parts. We will use this example to illustrate the components of a statistical hypothesis testing problem.

1.
The Scientific Hypothesis The scientific hypothesis is the hypothesized outcome of the experiment or study. In this example, the scientific hypothesis is that there is a tendency to grind the parts larger than the target diameter.

2.
The Statistical Model We will assume these data were generated by the C+E model:
$\begin{displaymath} Y=\mu+\epsilon,\end{displaymath}$
where the random error, $\epsilon$ , follows a $N(0,\sigma^2)$ distribution model.

3.
The Statistical Hypotheses In terms of the C+E model, management defined ``a tendency to grind the parts larger than the target diameter'' to be a statement about the population mean diameter, $\mu$ , of the ground parts. They then defined the statistical hypotheses to be

H₀: $\mu$ = 0.75

H_a: $\mu$ > 0.75

Notice that H_a states the scientific hypothesis.

4.
The Test Statistic In all one-parameter hypothesis test settings we will consider, the test statistic will be the estimator of the population parameter about which inference is being made. As you know from chapter 5, the estimator of $\mu$ is the sample mean, $\overline{Y}$ , and this is also the test statistic. The observed value of $\overline{Y}$ for these data is $\overline{y}^*=0.7518$ .

5.
The P-Value Think of this as the plausibility value. It measures the probability, given that H₀ is true, that a randomly chosen value of the test statistic will give as much or more evidence against H₀ and in favor of H_a as does the observed test statistic value.
For the grinding problem, since H_a states that $\mu\gt.75$ , large values of $\overline{Y}$ will provide evidence against H₀ and in favor of H_a. Therefore any value of $\overline{Y}$ as large or larger than the observed value $\overline{y}^*=0.7518$ will provide as much or more evidence against H₀ and in favor of H_a as does the observed test statistic value. Thus, the p-value is $P_0(\overline{Y}\geq 0.7518)$ , where P₀ is the probability computed under the assumption that H₀ is true: that is, $\mu=0.75$ .
To calculate the p-value, we standardize the test statistic by subtracting its mean (remember we're assuming H₀ is true, so we take $\mu=0.75$ ) and dividing by its estimated standard error:
$\begin{displaymath} \hat{\sigma}(\overline{Y}) = s/\sqrt{n} = 0.0048/\sqrt{150} = 0.0004.\end{displaymath}$

If H₀ is true, the result will have a t_n-1=t₁₄₉ distribution.
Putting this all together, the p-value is
$\begin{displaymath} P_0(\overline{Y}\geq 0.7518) = P_0\left(\frac{\overline{Y}... ...5}{0.0004}\right) = P(t_{149}\geq 4.5) = 6.8\times 10^{-6}.\end{displaymath}$
Two-Sided Tests
In all examples we'll look at, H₀ will be simple (i.e. will state that the parameter has a single value.) as opposed to compound. Alternative hypotheses will be one-sided (that the parameter be larger the null value, or smaller than the null value) or two-sided (that the parameter not equal the null value).
In the grinding example, we had

H₀: $\mu$ = 0.75 ( simple)

H_a: $\mu$ > 0.75 ( compound, one-sided)

Suppose in the grinding problem that management wanted to see if the mean diameter was off target. Then appropriate hypotheses would be:

H₀: $\mu$ = 0.75 (simple)

H_a: $\mu$ $\neq$ 0.75 (compound, two-sided)

In this case, evidence against H₀ and in favor of H_a is provided by both large and small values of $\overline{Y}$ .
To compute the p-value of the two-sided test, we first compute the standardized test statistic t, and its observed value, t^*:
$\begin{displaymath} t=\frac{\overline{Y}-0.75}{0.0004}, \; t^*=\frac{0.7518-0.75}{0.0004}=4.5.\end{displaymath}$
Recall that under H₀, $t\sim t_{149}$ . By the symmetry of the t distribution about 0, we compute the p-value as $P(\vert t\vert\geq\vert t^*\vert)=P(\vert t\vert\geq 4.5)=13.6\times 10^{-6}.$
The Philosophy of Hypothesis Testing
Statistical hypothesis testing is modeled on scientific investigation. The two hypotheses represent competing scientific hypotheses.
- The alternative hypothesis is the hypothesis that suggests change, difference or an aspect of a new theory.
- The null hypothesis is the hypothesis that represents the accepted scientific view or that, most often, suggests no difference or effect.
For this reason the null hypothesis is given favored treatment.
Other Issues
- Statistical significance
- Cautions
  
  o
  Statistical vs. practical significance
  o
  Exploratory vs. confirmatory
  o
  Lotsa tests means false positives
  o
  Data suggesting hypotheses
  o
  Lack of significance $\neq$ failure
One Sample Hypothesis Tests for the Mean in the C+E Model
Check out Appendix 6.1, p. 346, with me!
One Sample Hypothesis Tests for a Population Proportion
First, check out Appendix 6.1, p. 347, with me!
Example:
Back at the grinding operation, management has decided on another characterization of the scientific hypothesis that ``there is a tendency to grind the parts larger than the target diameter.'' They decide to make inference about p, the population proportion of in-spec parts with diameters larger than the target value. The hypotheses are

H₀: p = 0.5

H_a: p > 0.5

The datum is Y, the number of the 150 sampled parts with diameters larger than the target value.
Of the 150 parts, y^*=93 (a proportion 0.62) have diameters greater than the target value 0.75.
We will first perform an exact test of these hypotheses. Under H₀, $Y\sim b(150,0.5)$ , so the p-value is
$\begin{displaymath} p^+=P(b(150,0.5)\geq 93)=0.0021.\end{displaymath}$

Now, for illustration, we will use the large-sample test. This is valid since np₀ and n(1-p₀) both equal 75>10.
The observed standardized test statistic is
$\begin{displaymath} z^*=\frac{93-(0.5)(150)}{\sqrt{(150)(0.5)(1-0.5)}}=2.94.\end{displaymath}$
The approximate p-value is then
$\begin{displaymath} P(N(0,1)\geq 2.94)=0.0016.\end{displaymath}$
The Two Population C+E Model
We assume that there are n₁ measurements from population 1 generated by the C+E model
$\begin{displaymath} Y_{1,i}=\mu_1+\epsilon_{1,i},\; i=1, \ldots, n_1,\end{displaymath}$
and n₂ measurements from population 2 generated by the C+E model
$\begin{displaymath} Y_{2,i}=\mu_2+\epsilon_{2,i},\; i=1, \ldots, n_2.\end{displaymath}$

We want to compare $\mu_1$ and $\mu_2$ .
Hypothesis Test for Paired Comparisons
Sometimes each observation from population 1 is paired with another observation from population 2. For example, each student may take a pre- and post-test. In this case n₁=n₂ and by looking at the pairwise differences, D_i=Y_1,i-Y_2,i, we transform the two population problem to a one population problem for C+E model $D=\mu_D+\epsilon_D$ , where $\mu_D=\mu_1-\mu_2$ and $\epsilon_D=\epsilon_1-\epsilon_2$ . Therefore, an hypothesis test for the difference $\mu_1-\mu_2$ is obtained by performing a one sample hypothesis test for $\mu_D$ based on the differences D_i.

Example:

The manufacturer of a new warmup bat wants to test its efficacy. To do so, it selects a random sample of 12 baseball players from among a larger number who volunteer to try the bat. For each player, company researchers compute D, the difference between the player's test year average and his previous year's average. Assuming that these differences follow a C+E model, they want to test

H₀:	$\mu_D$	=
H_a:	$\mu_D$	>

The data (found in SASDATA.BATTING) are:

PLAYER	AVG92	AVG93	DIFFAVG
1	0.254	0.262	0.008
2	0.274	0.290	0.016
3	0.300	0.304	0.004
4	0.246	0.267	0.021
5	0.278	0.291	0.013
6	0.252	0.257	0.005
7	0.235	0.248	0.013
8	0.313	0.324	0.021
9	0.305	0.317	0.012
10	0.255	0.252	-0.003
11	0.244	0.276	0.032
12	0.322	0.332	0.010

An inspection of the differences shows no evidence of nonnormality or outliers, so we proceed with the test. For these data, $\overline{d}=0.0127$ , and s_d=0.0092. Then $\hat{\sigma}(\overline{D})=0.0092/\sqrt{12}=0.0027$ , so the observed value of the standardized test statistic is

$\begin{displaymath} t^*=\frac{0.0127}{0.0027}=4.70,\end{displaymath}$

resulting in a p-value

$\begin{displaymath} P(t_{11}\geq 4.7)=0.0006.\end{displaymath}$

Testing Differences in Population Means of Independent Populations
Let $\overline{Y}_1$ and $\overline{Y}_2$ denote the sample means from populations 1 and 2, S₁² and S₂² the sample variances. The point estimator of $\mu_1-\mu_2$ , is $\overline{Y}_1-\overline{Y}_2$ . We will test

H₀: $\mu_1-\mu_2$ = $\delta_0$

Versus one of

H_{a_-}: $\mu_1-\mu_2$ < $\delta_0$ ,

H_a⁺: $\mu_1-\mu_2$ < $\delta_0$ ,

$H_{a\pm}:$ $\mu_1-\mu_2$ $\neq$ $\delta_0$ .
Equal Variances
If the population variances are equal ( $\sigma_1^2=\sigma_2^2=\sigma^2$ ), then we estimate $\sigma^2$ by the pooled variance estimator
$\begin{displaymath} S^2_p=\frac{(n_1-1)S_1^2+(n_2-1)S_2^2}{n_1+n_2-2}.\end{displaymath}$
The estimated standard error of $\overline{Y}_1-\overline{Y}_2$ is then given by
$\begin{displaymath} \hat{\sigma}_p(\overline{Y}_1-\overline{Y}_2)= \sqrt{S_p^2\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}.\end{displaymath}$

Then, if H₀ is true,
$\begin{displaymath} t^{(p)}=\frac{\overline{Y}_1-\overline{Y}_2-\delta_0}{\hat{\sigma}_p(\overline{Y}_1-\overline{Y}_2)}\end{displaymath}$
has a t_n₁+n₂-2 distribution.
Suppose t^(p)* is the observed value of t^(p). Then the p-value of the test of H₀ versus H_{a_-} is
$\begin{displaymath} p_-=P(t_{n_1+n_2-2}\leq t^{(p)*}),\end{displaymath}$
versus H_a⁺ is
$\begin{displaymath} p^+=P(t_{n_1+n_2-2} \geq t^{(p)*}),\end{displaymath}$
and versus $H_{a\pm}$ is
$\begin{displaymath} p\pm=2\min(p_-,p^+).\end{displaymath}$
Unequal Variances
If $\sigma_1^2 \neq \sigma_2^2$ , then the standardized test statistic
$\begin{displaymath} t^{(ap)}=\frac{\overline{Y}_1-\overline{Y}_2-\delta_0} {\hat{\sigma}(\overline{Y}_1-\overline{Y}_2)}.\end{displaymath}$
approximately follows a distribution model, where is the largest integer less than or equal to
$\begin{displaymath} \frac{\left(\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}\right)^2} {... ...ght)^2}{n_1-1}+\frac{\left(\frac{S_2^2}{n_2}\right)^2}{n_2-1}},\end{displaymath}$
and
$\begin{displaymath} \hat{\sigma}(\bar{Y}_1-\bar{Y}_2)=\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}.\end{displaymath}$

If t^(ap)* denotes the observed value of t^(ap), the p-values for H₀ versus H_{a_-}, H_a⁺ and $H_{a\pm}$ , respectively, are $p_-=P(t_\nu \leq t^{(ap)*})$ , $p^+=P(t_\nu \geq t^{(ap)*})$ and $p\pm=2\min(p_-,p^+)$ .
Example:
A company buys cutting blades used in its manufacturing process from two suppliers. In order to decide if there is a difference in blade life, the lifetimes of 10 blades from manufacturer 1 and 13 blades from manufacturer 2 used in the same application are compared. A summary of the data shows the following (units are hours): (The data are in SASDATA.BLADE2)

Manufacturer n $\overline{y}$ s

1 10 118.4 26.9

2 13 134.9 18.4

The experimenters want to test

H₀: $\mu_1-\mu_2$ =

H_a: $\mu_1-\mu_2$ $\neq$

The experimenters generated histograms and normal quantile plots of the two data sets and found no evidence of nonnormality or outliers. The estimate of $\mu_1-\mu_2$ is $\overline{y}_1-\overline{y}_2=118.4-134.9=-16.52$ .
- Pooled variance test The pooled variance estimate is
  
  $\begin{displaymath} s^2_p = \frac{(10-1)(26.9)^2+(13-1)(18.4)^2}{10+13-2} = 503.6,\end{displaymath}$
  
  So the standard error estimate of $\overline{Y}_1-\overline{Y}_2$ is
  
  $\begin{displaymath} \hat{\sigma}_p(\overline{Y}_1-\overline{Y}_2) = \sqrt{503.6\left(\frac{1}{10}+\frac{1}{13}\right)} = 9.44.\end{displaymath}$
  
  Therefore, t^(p)*=-16.52/9.44=-1.75, with 21 degrees of freedom. So $p_-=P(t_{21}\leq -1.75)=0.0473$ , $p^+=P(t_{21}\geq -1.75)=0.9527$ , and the p-value for this problem is $2\min(0.0473,0.9527)=0.0946$ .
- Separate variance test The standard error estimate of $\overline{Y}_1-\overline{Y}_2$ is
  $\begin{displaymath} \hat{\sigma}(\overline{Y}_1-\overline{Y}_2)=\sqrt{\frac{(26.9)^2}{10}+\frac{(18.4)^2}{13}}=9.92.\end{displaymath}$
  The observed value of the standardized test statistic is t^(ap)*=-16.52/9.92=-1.67. The degrees of freedom $\nu$ is computed as the greatest integer less than or equal to
  $\begin{displaymath} \frac{\left(\frac{(26.9)^2}{10}+\frac{(18.4)^2}{13}\right)^2... ...}{10-1}+\frac{\left(\frac{(18.4)^2}{13}\right)^2}{13-1}}=15.17,\end{displaymath}$
  so $\nu=15$ .
  Therefore, $p_-=P(t_{15}\leq -1.67)=0.0583$ , $p^+=P(t_{15}\geq -1.67)=0.9417$ , and the p-value for this problem is
  $2\min(0.0583,0.9417)=0.1166$ .
  The results for the two t-tests are not much different.
Comparing Two Population Proportions
$Y_1 \sim b(n_1,p_1)$ and $Y_2 \sim b(n_2,p_2)$ are observations from two independent populations. The estimator of p₁-p₂ is
$\begin{displaymath} \hat{p}_1-\hat{p}_2=\frac{Y_1}{n_1}-\frac{Y_2}{n_2}.\end{displaymath}$

We wish to test a null hypothesis that the two population proportions differ by a known amount $\delta_0$ ,

H₀: p₁-p₂ = $\delta_0$ ,

against one of three possible alternative hypotheses:

H_a⁺: p₁-p₂ > $\delta_0$

H_{a_-}: p₁-p₂ < $\delta_0$

$H_{a\pm}:$ p₁-p₂ $\neq$ $\delta_0$

Case 1: 0
Suppose H₀ is p₁-p₂=0. Then, let p=p₁=p₂ denote the common value of the two population proportions. If H₀ is true, the variance of $\hat{p}_1$ equals p(1-p)/n₁ and that of $\hat{p}_2$ equals p(1-p)/n₂. This implies the standard error of $\hat{p}_1-\hat{p}_2$ equals
$\begin{displaymath} \sqrt{\frac{p(1-p)}{n_1}+\frac{p(1-p)}{n_2}}.\end{displaymath}$

Since we don't know p, we estimate it using the data from both populations:
$\begin{displaymath} \hat{p}=\frac{Y_1+Y_2}{n_1+n_2}.\end{displaymath}$

The estimated standard error of $\hat{p}_1-\hat{p}_2$ is then

$\begin{displaymath} \hat{\sigma}_0(\hat{p}_1-\hat{p}_2) = \sqrt{\frac{\hat{p} (... ...\hat{p}(1-\hat{p}) \left(\frac{1}{n_1}+\frac{1}{n_2} \right)}.\end{displaymath}$

The standardized test statistic is then
$\begin{displaymath} Z_0=\frac{\hat{p}_1-\hat{p}_2} {\hat{\sigma}_0(\hat{p}_1-\hat{p}_2)}.\end{displaymath}$
which has a N(0,1) distribution if H₀ is true.
Case 2: 0
If $\delta_0 \neq 0$ , the (by now) standard reasoning gives the standardized test statistic
$\begin{displaymath} Z=\frac{\hat{p}_1-\hat{p}_2-\delta_0}{\hat{\sigma}(\hat{p}_1-\hat{p}_2)},\end{displaymath}$
where
$\begin{displaymath} \hat{\sigma}(\hat{p}_1-\hat{p}_2)=\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1}+ \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\end{displaymath}$
is the estimated standard error of $\hat{p}_1-\hat{p}_2$ .
Example:
In a recent survey on academic dishonesty 24 of the 200 female college students surveyed and 26 of the 100 male college students surveyed agreed or strongly agreed with the statement ``Under some circumstances academic dishonesty is justified.'' Suppose p_f denotes the proportion of all female and p_m the proportion of all male college students who agree or strongly agree with this statement.
- If we want to Test
  
  H₀: p_f-p_m =
  
  H_a: p_f-p_m $\neq$
  
  Since Y_f=24, 200-Y_f=176, Y_m=26, and 100-Y_m=74 all exceed 10, we may use the normal approximation.
  The point estimate of p_f-p_m is
  $\begin{displaymath} \hat{p}_f-\hat{p}_m=24/200-26/100=-0.140,\end{displaymath}$
  and the estimate of the common value of p_f and p_m under H₀ is $\hat{p}=(26+24)/(200+100)=0.167$ .
  Thus,
  
  $\begin{displaymath} \hat{\sigma}_0(\hat{p}_f-\hat{p}_m) = \sqrt{(0.167)(0.833)\left(\frac{1}{200}+\frac{1}{100}\right)} = 0.046,\end{displaymath}$
  
  and
  $\begin{displaymath} Z_0=\frac{-0.140}{0.046}=3.04.\end{displaymath}$
  From this, we obtain $p^+=P(N(0,1)\geq 3.04)=0.0012$ , $p_-=P(N(0,1)\leq 3.04)=0.9988$ , and $p\pm=2\min(0.0012,0.9988)=0.0024$ , this last being the p-value we want.
- If we want to test
  
  H₀: p_f-p_m = -0.10
  
  H_a: p_f-p_m < -0.10
  
  The estimated standard error of p_f-p_m is
  
  $\begin{displaymath} \hat{\sigma}(\hat{p}_1-\hat{p}_2) = \sqrt{\frac{0.12(1-0.12)}{200}+ \frac{0.26(1-0.26)}{100}} = 0.05,\end{displaymath}$
  
  which gives
  
  $\begin{displaymath} Z = \frac{24/200-26/100-(-0.10)}{0.05} = -0.80,\end{displaymath}$
  
  and a p-value of $P(N(0,1)\leq -0.80)=0.2119$ .
Fixed Significance Level Tests
Steps, illustrated using grinding example:

1.
Specify hypotheses to be tested.

H₀: $\mu$ = 0.75

H_a: $\mu$ > 0.75

(i.e. $\mu_0=0.75$ )
2.
Set the significance level . Usual choices are 0.01 or 0.05. We'll choose the latter.

3.
Specify the (standardized) test statistic and it's distribution under H₀. For simplicity, assume we know $\sigma=0.0048$ . Then the standardized test statistic is
$\begin{displaymath} Z=\frac{\overline{Y}-\mu_0}{\sigma/\sqrt{n}}= \frac{\overline{Y}-0.75}{0.0048/\sqrt{150}},\end{displaymath}$
and under H₀ it has a N(0,1) distribution.

4.
Find the critical region of the test. The critical region of the test is the set of values of the (standardized) test statistic for which H₀ will be rejected in favor of H_a. Here, H_a tells us that the critical region has the form
$\begin{displaymath} [z_\alpha,\infty)=[z_{0.05},\infty)=[1.645, \infty),\end{displaymath}$
meaning H₀ will be rejected if and only if the observed value of Z is greater than or equal to 1.645.

5.
Perform the test. The observed value of Z is
$\begin{displaymath} z^*=\frac{0.7518-0.75}{0.0048/\sqrt{150}}=4.5,\end{displaymath}$
which falls in the critical region, so H₀ is rejected in favor of H_a.
Power In a fixed significance level test, power is the probability of rejecting H₀ in favor of H_a. Power will vary for different values of the parameter being tested, so it is written as a function of that parameter.
In the grinding example, the power is

$\begin{displaymath} \Pi(\mu) = P(Z\geq 1.645\vert\mu) = P(\overline{Y}\geq 0.... ... = P(Z^\prime\geq 1.645+\frac{0.75-\mu}{0.0048/\sqrt{150}}),\end{displaymath}$

where $Z^\prime=\frac{Z-\mu}{\sigma/\sqrt{n}} \sim N(0,1)$ .
The relation between hypothesis tests and confidence intervals

About this document ...

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

The command line arguments were:
latex2html -split 0 lect6.

The translation was initiated by Joseph D Petruccelli on 10/27/1999

$next$ $up$ $previous$

Joseph D Petruccelli
10/27/1999

H₀:	$\mu$	=	0.75 ( simple)
H_a:	$\mu$	>	0.75 ( compound, one-sided)

H₀:	$\mu$	=	0.75 (simple)
H_a:	$\mu$	$\neq$	0.75 (compound, two-sided)

H₀:	$\mu_1-\mu_2$	=	$\delta_0$
Versus one of
H_{a_-}:	$\mu_1-\mu_2$	<	$\delta_0$ ,
H_a⁺:	$\mu_1-\mu_2$	<	$\delta_0$ ,
$H_{a\pm}:$	$\mu_1-\mu_2$	$\neq$	$\delta_0$ .

H_a⁺:	p₁-p₂	>	$\delta_0$
H_{a_-}:	p₁-p₂	<	$\delta_0$
$H_{a\pm}:$	p₁-p₂	$\neq$	$\delta_0$

H₀:	p	=	0.5
H_a:	p	>	0.5

Manufacturer	n	$\overline{y}$	s
1	10	118.4	26.9
2	13	134.9	18.4

H₀:	p_f-p_m	=
H_a:	p_f-p_m	$\neq$

H₀:	p_f-p_m	=	-0.10
H_a:	p_f-p_m	<	-0.10