• Statistical Inference:

Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 you studied one kind of inference called estimation. In this chapter, we study a second kind of inference called hypothesis testing.

The validity of inference is related to the way the data are obtained, and to the stationarity of the process producing the data.

• The Components of a Statistical Hypothesis Testing Problem
1.
The Scientific Hypothesis
2.
The Statistical Model
3.
The Statistical Hypotheses
4.
The Test Statistic
5.
The P-Value
• Example:

One stage of a manufacturing process involves a manually-controlled grinding operation. Management suspects that the grinding machine operators tend to grind parts slightly larger rather than slightly smaller than the target diameter, 0.75 inches while still staying within specification limits, which are 0.75 0.01 inches. To verify their suspicions, they sample 150 within-spec parts. We will use this example to illustrate the components of a statistical hypothesis testing problem.

• 1.
The Scientific Hypothesis The scientific hypothesis is the hypothesized outcome of the experiment or study. In this example, the scientific hypothesis is that there is a tendency to grind the parts larger than the target diameter.
2.
The Statistical Model We will assume these data were generated by the C+E model:

where the random error, , follows a distribution model.

3.
The Statistical Hypotheses In terms of the C+E model, management defined a tendency to grind the parts larger than the target diameter'' to be a statement about the population mean diameter, , of the ground parts. They then defined the statistical hypotheses to be
 H0: = 0.75 Ha: > 0.75
Notice that Ha states the scientific hypothesis.

4.
The Test Statistic In all one-parameter hypothesis test settings we will consider, the test statistic will be the estimator of the population parameter about which inference is being made. As you know from chapter 6, the estimator of is the sample mean, , and this is also the test statistic. The observed value of for these data is .

5.
The P-Value Think of this as the plausibility value. It measures the probability, given that H0 is true, that a randomly chosen value of the test statistic will give as much or more evidence against H0 and in favor of Ha as does the observed test statistic value.

For the grinding problem, since Ha states that , large values of will provide evidence against H0 and in favor of Ha. Therefore any value of as large or larger than the observed value will provide as much or more evidence against H0 and in favor of Ha as does the observed test statistic value. Thus, the p-value is , where P0 is the probability computed under the assumption that H0 is true: that is, .

To calculate the p-value, we standardize the test statistic by subtracting its mean (remember we're assuming H0 is true, so we take ) and dividing by its estimated standard error:

If H0 is true, the result will have a tn-1=t149 distribution. Putting this all together, the p-value is

• Two-Sided Tests

In all examples we'll look at, H0 will be simple (i.e. will state that the parameter has a single value.) as opposed to compound. Alternate hypotheses will be one-sided (that the parameter be larger the null value, or smaller than the null value) or two-sided (that the parameter not equal the null value).

In the grinding example, we had

 H0: = 0.75 ( simple) Ha: > 0.75 ( compound, one-sided)
• Suppose in the grinding problem that management wanted to see if the mean diameter was off target. Then appropriate hypotheses would be:
 H0: = 0.75 (simple) Ha: 0.75 (compound, two-sided)

In this case, evidence against H0 and in favor of Ha is provided by both large and small values of .

• To compute the p-value of the two-sided test, we first compute the standardized test statistic t, and its observed value, t*:

Recall that under H0, . By the symmetry of the t distribution about 0, we compute the p-value as
• The Philosophy of Hypothesis Testing

Statistical hypothesis testing is modeled on scientific investigation. The two hypotheses represent competing scientific hypotheses.

• The alternate hypothesis is the hypothesis that suggests change, difference or an aspect of a new theory.
• The null hypothesis is the hypothesis that represents the accepted scientific view or that, most often, suggests no difference or effect.

For this reason the null hypothesis is given favored treatment.

• Other Issues

• Statistical significance
• Cautions
o
Statistical vs. practical significance
o
Exploratory vs. confirmatory
o
Lotsa tests means false positives
o
Data suggesting hypotheses
o
Lack of significance failure
• One Sample Hypothesis Tests for the Mean in the C+E Model

Check out the appendix 6.1, p. 346, with me!

• One Sample Hypothesis Tests for a Population Proportion

First, check out the appendix 6.1, p. 347, with me!

Example:

Back at the grinding operation, management has decided on another characterization of the scientific hypothesis that there is a tendency to grind the parts larger than the target diameter.'' They decide to make inference about p, the population proportion of in-spec parts with diameters larger than the target value. The hypotheses are

 H0: p = 0.5 Ha: p > 0.5

The datum is Y, the number of the 150 sampled parts with diameters larger than the target value.

Of the 150 parts, y*=93 (a proportion 0.62) have diameters greater than the target value 0.75.

We will first perform an exact test of these hypotheses. Under H0, , so the p-value is

Now, for illustration, we will use the large-sample test. This is valid since np0 and n(1-p0) both equal 75>10.

The observed standardized test statistic is

The approximate p-value is then

• The Two Population C+E Model

We assume that there are n1 measurements from population 1 generated by the C+E model

and n2 measurements from population 2 generated by the C+E model

We want to compare and .

• Hypothesis Test for Paired Comparisons

Sometimes each observation from population 1 is paired with another observation from population 2. For example, each student may take a pre- and post-test. In this case n1=n2 and by looking at the pairwise differences, Di=Y1,i-Y2,i, we transform the two population problem to a one population problem for C+E model , where and . Therefore, an hypothesis test for the difference is obtained by performing a one sample hypothesis test for based on the differences Di.

• Example:

In 1993 the National League expanded by adding the Florida and Colorado teams. Many experts predicted that this expansion would dilute the quality of pitching and inflate team batting statistics. Others pointed out that the batting level would also decline, and that the result would be little or no difference. To assess who was right, we have collected the team batting averages for 1992 and 1993 for all 12 teams that were in the league in 1992. We assume that each team's batting average each year follows a C+E model centered about an overall (and unknown) league average.

Since most personnel on a team stay the same from one year to the next, we feel that paired comparisons are appropriate. Thus, we compute the differences in 1993 and 1992 averages for each team.

Thus, for each team, we will compute D, the difference between the 1993 and 1992 team batting average. We will test the hypotheses

 H0: = 0 Ha: > 0

The data (found in SASDATA.NLAVG923) are:

 TEAM AVG92 AVG93 DIFFAVG ATL 0.254 0.262 0.008 CHI 0.254 0.270 0.016 CIN 0.260 0.264 0.004 HOU 0.246 0.267 0.021 LA 0.248 0.261 0.013 MON 0.252 0.257 0.005 NY 0.235 0.248 0.013 PHI 0.253 0.274 0.021 PIT 0.255 0.267 0.012 SD 0.255 0.252 -0.003 SF 0.244 0.276 0.032 STL 0.262 0.272 0.010

An inspection of the differences shows no evidence of nonnormality or outliers, so we proceed with the test. For these data, , and sd=0.0092. Then , so the observed value of the standardized test statistic is

resulting in a p-value

• Testing Differences in Population Means of Independent Populations

Let and denote the sample means from populations 1 and 2, S12 and S22 the sample variances. The point estimator of , is . We will test

 H0: = Versus one of Ha-: < , Ha+: < , .

Equal Variances

If the population variances are equal (), then we estimate by the pooled variance estimator

The estimated standard error of is then given by

Then, if H0 is true,

has a tn1+n2-2 distribution. Suppose t(p)* is the observed value of t(p). Then the p-value of the test of H0 versus Ha- is

p-=P(tn1+n2-2<t(p)*),

versus Ha+ is

p+=P(tn1+n2-2>t(p)*),

and versus is

• Unequal Variances

If , then the standardized test statistic

approximately follows a distribution model, where is the largest integer less than or equal to

and

If t(ap)* denotes the observed value of t(ap), the p-values for H0 versus Ha-, Ha+ and ,respectively, are , and .

• Example:

A company buys grinding wheels used in its manufacturing process from two suppliers. In order to decide if there is a difference in wheel life, the lifetimes of 10 wheels from manufacturer 1 and 13 wheels from manufacturer 2 used in the same application are compared. A summary of the data shows the following (units are hours): (The data are in SASDATA.GRIND2)

 Manufacturer n s 1 10 118.4 26.9 2 13 134.9 18.4
Test
 H0: = 0 Ha: 0
• The experimenters generated histograms and normal quantile plots of the two data sets and found no evidence of nonnormality or outliers. The estimate of is .

• Pooled variance test The pooled variance estimate is

This gives the standard error estimate of as

Therefore, t(p)*=-16.52/9.44=-1.75, with 21 degrees of freedom. So , ,and the p-value for this problem is .
• Separate variance test The standard error estimate of is

The observed value of the standardized test statistic is t(ap)*=-16.52/9.92=-1.67. The degrees of freedom is computed as the greatest integer less than or equal to

so .Therefore, , ,and the p-value for this problem is
.

The results for the two t-tests are not much different.

• Comparing Two Population Proportions

and are observations from two independent populations. The estimator of p1-p2 is

We wish to test a null hypothesis that the two population proportions differ by a known amount ,

 H0: p1-p2 = ,
against one of three possible alternate hypotheses:
 Ha+: p1-p2 > Ha-: p1-p2 < p1-p2

Case 1:

Suppose H0 is p1-p2=0. Then, let p=p1=p2 denote the common value of the two population proportions. If H0 is true, the variance of equals p(1-p)/n1 and that of equals p(1-p)/n2. This implies the standard error of equals

Since we don't know p, we estimate it using the data from both populations:

The estimated standard error of is then

The standardized test statistic is then

which has a N(0,1) distribution if H0 is true.

Case 2:

If ,the (by now) standard reasoning gives the standardized test statistic

where

is the estimated standard error of .
• Example:

In a recent survey on academic dishonesty 24 of the 200 female college students surveyed and 26 of the 100 male college students surveyed agreed or strongly agreed with the statement Under some circumstances academic dishonesty is justified.'' Suppose pf denotes the proportion of all female and pm the proportion of all male college students who agree or strongly agree with this statement.

• Test
 H0: pf - pm = 0 Ha: pf - pm 0

Since Yf=24, 200-Yf=176, Ym=26, and 100-Ym=74 all exceed 10, we may use the normal approximation.

The point estimate of pf - pm is

and the estimate of the common value of pf and pm under H0 is . Thus, and

From this, we obtain , , and , this last being the p-value we want.
• Test
 H0: pf - pm = -0.1 Ha: pf - pm < -0.1

The estimated standard error of pf - pm is

=0.05,

which gives

and a p-value of .
• Other Topics
• Fixed significance level tests
• Power
• The relation between hypothesis tests and confidence intervals