Introduction

This document contains detailed instructions of how to use SAS to do the kinds of analyses documented in the text Applied Statistics for Engineers and Scientists. Each of the first fifteen sections is keyed to one chapter of the text, and references material in that chapter. For example, Section 1 begins by describing where to find the electric usage data from Chapter 1 of the book and how to use those data and SAS to generate Figures 1.1 and 1.3 in the text. If you have not used SAS at WPI before, you should visit the Introduction to Windows SAS at WPI. Many of the analyses described in the text use a graphically-oriented component of SAS called SAS/INSIGHT. (SAS and SAS/INSIGHT are registered trademarks of SAS Institute, Inc.) An introduction to SAS/INSIGHT is found in An Introduction to SAS/INSIGHT I: Elementary Concepts, and more advanced features are found in An Introduction to SAS/INSIGHT II: Advanced Concepts. We have written a collection of SAS macros to facilitate the use of SAS in statistical applications and in labs. Instructions on how to access these macros are found in An Introduction to SAS/EIS. The uses of these macros are described where appropriate in this document. Throughout this document we will give only one-part names for all SAS data sets. This is the same as assuming the data sets are in your work, or temporary storage, area. At WPI, all data sets are in a SAS data library named SASDATA. To access the data sets from SAS/INSIGHT, go to the SASDATA library in the open data sets dialog window. If inputing the data set name to a macro, prefix the name with SASDATA. For example to use the ELECE data set in a macro, input the name SASDATA.ELECE.

Doing It with SAS: Chapter 1

Data Sets

Important data sets for Chapter 1 are:

ELECE : Electric usage data from Professor P.'s household.
WASHER5: Washer thickness data, Example 1.2.
WASHER7: Washer thickness data, Example 1.3.
EG1_5: Bearing play data, Example 1.5.
EG1_5A: Bearing play data, Example 1.5.

Stationarity of Processes

Figures 1.1 and 1.3 of the Text

Figures 1.1 and 1.3 were produced with SAS/INSIGHT. Figure 1.1 was created by choosing Analyze:Histogram/Bar Chart ( Y ) and then selecting DKWH from the resulting dialog window. Figure 1.3 was produced by first selecting Analyze:Scatterplot( Y X ) and then choosing DKWH as the Y variable and DATE as the X variable in the dialog window. This produced the scatterplot. Producing the corresponding histogram was a little trickier. First we created a rectangle to the right of the scatterplot by clicking there with the left mouse button and dragging. It doesn't matter how large the rectangle is. We next put a vertical bar chart there by choosing Analyze:Histogram/Bar Chart( Y ) and then selecting KWH from the resulting dialog window. To make the bar chart horizontal (this is the neat part), we clicked on the upper left corner and dragged that corner down past the lower right. (It's the click-and-drag version of turning a sleeve inside out.) We then moved the rectangle next to the scatterplot and resized it as desired. To align the KWH axes on both plots, we chose Edit:Windows:Align.

Figures 1.4, 1.5 and 1.6 of the Text

Figure 1.4 was produced by the macro TSPLOT, and Figures 1.5 and 1.6 were produced by the macro TSMAPRED. Input to TSMAPRED will include (in order)

The name of the SAS data set containing the data (ELECE).
The name of a SAS data set to contain the output (original data plus the smoothed series and residuals). Any name of eight characters or less will do.
The name of the variable to be smoothed (here it is DKWH).
The name of the variable to contain the smoothed series. This will be put in the output data set; use whatever name of eight characters or less you like.
The name of the variable to contain the residuals (which are the original data minus the smoothed values). Again, use whatever name of eight characters or less you like.
The name of the time variable (DATE).
The number of terms in the moving average. We used 7 and 28 for the two plots.

The equivalent of Figure 1.4 can be produced in SAS/INSIGHT by choosing Analyze:Line Plot(YX) and selecting KWH as the Y variable and DATE as the X variable.

Causes of Variation

All the plots in this section were created using SAS/INSIGHT .

Example 1.2: A Stationary Process that Wasn't

The data for Figure 1.8 of the text are in the SAS data set WASHER5. To create Figure 1.8 select Scatterplot ( Y X ). In the resulting dialog box, choose THICK as the Y variable, ORDER as the X variable and MACHINE as the Group variable. The result is the three plots you see, but aligned horizontally rather than vertically. In addition, the vertical axes of the plots differ. To get the vertical axes to line up, on the graph window select Edit:Windows:Align. Use clicking (on the bounding box of the plots) and dragging to place the graphs in the vertical configuration shown. Figure 1.9 was done in exactly the same way using the data in WASHER7.

Stratified Plots

To create the stratified plot in Figure 1.10, open the data set WASHER7, choose Analyze:Scatter Line Plot(YX), and select MACHINE as the X variable and THICK as the Y variable.

Identifying Possible Causes of Variation

Creating a New Ishikawa Diagram

While you can draw effective Ishikawa diagrams by hand, presentation-quality diagrams are easily drawn using SAS as follows:

From the menu bar on the PROGRAM EDITOR, LOG or OUTPUT windows, choose Globals:Analyze:Quality Improvement.
From the Statistical Quality Control Menu, select ``ISHIKAWA''.
From the resulting pop up window select ``Create a New Ishikawa Diagram''.
A graphics window containing a template for an Ishikawa diagram will appear. By entering text from the keyboard and using the mouse to position arrows, it is easy to create your own diagram. For example, to begin creating the diagram for High Strength Mesh, click on the graphics window to make it active. Then just type ``High Strength Mesh'' on the keyboard and hit return twice. The words ``High Strength Mesh'' will now appear in the box describing the main arrow. Now enter ``Methods'' in the same way, and the box for the upper left arrow is set. The other boxes and arrows will now disappear and you are free to create the diagram as you want. To learn how to do this, click on ``Help'' from the menu bar of the graphics window, and then on ``Extended help'' from the pop up window.
Read the first help screen, then click on the highlighted word ``Introduction''. To find out how to perform a given task, click on the highlighted word (e.g. adding, moving, etc.) describing that task. If you see anything about using PROC ISHIKAWA, ignore it; SAS has already called that SAS procedure for you-so you don't have to.
This all sounds more complicated than it is. In fact, in a few minutes, you should be adept at drawing the diagram. As always, if you have questions, just ask.

Saving a Diagram

You may want to save a diagram for later use. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``File''. A ``File Requestor'' window (for selecting where to save the diagram) will appear. You must first select a library in which to save the diagram. If you want to save it temporarily (it will disappear after you exit SAS), select the library ``WORK''. If you want it to be there for future SAS sessions, select the library ``SASUSER''. Next select a name for the data set (your choice, 8 or fewer characters), and click on ``OK''.

Retrieving a Diagram

To retrieve a saved Ishikawa diagram from the Ishikawa window, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Open''. A ``File Requestor'' window will appear. This window is identical to the one you used to save the diagram. When you select the file, a new Ishikawa window with the selected diagram will appear. To retrieve a saved Ishikawa diagram from the Statistical Quality Control window in SAS, click on the ISHIKAWA icon, then from the resulting window select ``Edit an Existing Ishikawa Diagram''. A ``File Requestor'' window will appear. Choose the saved diagram you desire, and a window will appear with the saved diagram in it. Some of the finer detail may be missing, however. To restore it, click anywhere on the diagram with the right mouse button and select ``> Detail''.

Printing a Diagram

You may want to print your Ishikawa diagram. To do this, you must first save the diagram to a graphics catalog. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``Graph''. An ``Output Manager'' window (for selecting where to save the diagram) will appear. This window will have a default library and file name already showing (probably WORK.GSEG.ISHIKAWA). If you want to go with this name (and we'll assume here that you do), click on the ``OK'' button. The Ishikawa diagram will appear in a regular graphics window. It can be printed from there in the usual way of printing all graphics output.

Doing It with SAS: Chapter 2

Data Sets

TECHSAL: Salaries of technical support workers, Example 2.4.
PLANETS: Planet diameters.
BEARINGS: Weights and diameters of 100 ball bearings.
ULTRACAL: Ultrasound calibration data, Example 2.6.

Histograms and Bar Charts

Frequency histograms and bar charts are obtained in SAS/INSIGHT using the command Analyze:Histogram/Bar Chart ( Y ).

Boxplots

You can easily generate boxplots in SAS/INSIGHT by choosing Analyze:Box Plot/Mosaic Plot ( Y ) . For example, the side-by-side boxplots shown in Figure 2.13 of the text compare the salaries of men and women in the TECHSAL data set. They were produced by selecting SALARY as the Y variable and GENDER as the X variable. You can add information to the boxplots. Choosing :means will add a diamond-shaped figure with the mean indicated by a horizontal line and a span of +- two standard deviations. Choosing :Serifs will add serifs: little cross lines at the ends of the whiskers. Choosing :Values will put the values of the medians, quartiles and ends of whiskers on the graph. If the mean diamonds are chosen, the values of the means will also be displayed. Try these features yourself with the TECHSAL data.

Numerical Summaries

The command Analyze:Distribution ( Y ) will produce numerical summaries such as the mean, median and standard deviation. It will also produce two plots: a boxplot and a density histogram. Density histograms are like frequency histograms, except that the height of each bar equals the density, rather than the frequency, of data in that bar's subinterval. The density in a subinterval is the frequency in the subinterval divided by the product of the number of observations in the data set and the subinterval width. You will learn more about density histograms in Chapter 4.

Resistant Measures

SAS/INSIGHT allows you to select from among a resistant estimator of the standard deviation (Gini's mean difference), and the two resistant estimators of location discussed above: the trimmed mean and the Winsorized mean. For the latter two you can choose the number of observations or the percentage of observations to be trimmed or Winsorized at each end. To compute these estimators, you must first generate the distribution window by choosing Analyze:Distribution ( Y ). From the menu bar on this window, click on Tables, and then select the resistant estimator of your choice.

Doing Lab 2.1 with SAS

The instructions below are numbered to correspond to the step numbers in the Experimental Procedure section of Lab 2.1. There are two versions of the instructions: the first for SAS/INSIGHT users and the second for for input of instructions from the command line.

SAS/INSIGHT

1: Access SAS/INSIGHT. The data are found in CRIME.
2,3: Select Analyze:Distribution ( Y ) from the data window. From the dialog box select AUTO as the Y variable and STATEN (the state name) as the label. A Distribution Window will appear with a density histogram, a boxplot, and tables of summary measures for AUTO.
2: Compute the k-times trimmed mean for k=3 by choosing Tables:Trimmed Mean:(1/2)N:3.
4: To identify the outlier on the boxplot, click on it. It will become highlighted, and its name will appear.
5: To change the Massachusetts auto theft rate of 1140.1 to a value of, say, 2140.1, click on the data window cell containing the value 1140.1, type 2140.1, and hit ``Enter'' (or ``Return'') on the keyboard. When you do this the plots and summary measures in the Distribution Window will be updated to reflect the change in the data.
6: To remove the Massachusetts data value, select Massachusetts then choose Edit:Observations:Exclude in Calculations from the Distribution Window. The summary measures will be updated to reflect the change, and the plots will be modified to show the observation is not included in the calculation (The square denoting the value in the boxplot will change to an x, for example.) To also remove the observation from the plots, choose Edit:Observations: Hide in Graphs. An alternative to all this is to select Massachusetts and then choose Edit:Delete. This removes Massachusetts from the copy of the data set you are working on (Don't worry, it won't remove it from the original data set; you have to save the modified copy to the same data set name to do that.)

From the SAS Command Line

The commands

proc univariate data=crime plot;
 var auto;
run;

will get all the output you need, except for the trimmed mean, which is unavailable from the SAS command line. The histogram produced will be a stem-and-leaf plot, in which data values serve as the histogram bars.

Doing It with SAS: Chapter 3

Data Set

WATCHES: Watch assembly times, Example 3.9.

Randomly Assigning Treatments to Experimental Units

From SAS/INSIGHT

To see how to use SAS/INSIGHT to randomly assign treatments to experimental units, consider again the example of watch assemblers and assembly methods from Example 3.9.

Begin with a data set consisting of the numbers 1 through 15 (one for each assembler).
Next generate a column of 15 numbers randomly selected from the interval (0,1) by choosing Edit: Variables: Other... and then choosing ranuni(a) from the resulting dialog box.
Now sort the column of random numbers by choosing :Sort. When the random numbers are sorted, the assembler numbers become randomly ordered.
Assign the first 5 of the assembler numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3.

From the SAS Command Line

The following commands will produce two columns of numbers in the output window:

data assign;
 do assemblr=1 to 15;
  rannum=ranuni(-1);
  output;
 end;
run;

proc sort data=assign out=assign;
 by rannum;
proc print;
 var assemblr;
run;

Assign the first 5 of the assemblr numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3.

Doing Lab 3.2 with SAS

In SAS/INSIGHT, you can label the observations in the scatterplot of PRESS versus STUDENT by selecting HAND as the label variable in the SAS Scatterplot ( Y X ) dialog window. Then, clicking on each point on the resulting plot will label the point. You can also label the points for right and left with different colors or symbols. To do this, select Edit: Windows: Tools. The SAS:Tools window will appear. To give the two hands different colors, click on the long color button at the bottom of the color pallette. A ``SAS: Color Observations'' window will appear. Click on HAND, and then on OK. To get different plotting symbols for the two hands, do the same steps, beginning with a click on the long button with all the symbols on it.

Doing It with SAS: Chapter 4

Data Sets

TECHSAL: Salaries of technical support workers, Example 4.1. GASKET: Gasket thicknesses, Example 4.23.

Computing Probabilities

You can use the macro NPROBS to compute the probability P(a < Y <= b) where the random variable Y has any of the following distributions studied in this section: binomial, Poisson, normal or Weibull. A data entry window prompts you for the name of the distribution and its parameters. You are also prompted for the values of a and b. To obtain P(Y <= b for the normal distribution, select a=-9999999. To obtain P(Y <= b) for the binomial, Poisson or Weibull distributions, select a=-1 (or any other negative value). To obtain P(Y > a) for the b(n,p) distribution, select b=n. To obtain P(Y > a) for the Poisson, normal or Weibull distributions, select b=99999999.

Fitting a Normal Distribution

Here is a sequence of steps a data analyst might use in analyzing the gasket data in Example 4.23.

First, produce a line plot of thickness versus production order (which, if done at the outset, would have saved the quality personnel in Example 4.23 a great deal of trouble). To do so, enter SAS/INSIGHT and choose Analyze:Line Plot ( Y X ). Select THICK as the Y variable and ORDER as the X variable. The plot should reveal the outliers as the first two values.
To look at the distribution of thickness, choose Analyze:Distribution ( Y ) and select THICK as the variable to be analyzed. Look at the distribution window that appears. What are the salient features of the data as displayed on the boxplot and histogram? How well do you think a normal curve will fit the data?
Fit the normal curve N( $\overline{Y}$ ,S^2) to the data by choosing Curves:Parametric Density from the distribution window. In the resulting dialog box, make sure Normal is selected as Distribution: and Sample Estimates/MLE is selected as Method: before clicking on the ``OK'' button.
To produce a normal quantile plot, choose Graphs:Q-Q Plot. In the resulting dialog box, make sure ``Normal'' is selected as the distribution. The normal quantile plot will appear with the values of the data quantiles on the vertical axis and those of the normal distribution quantiles on the horizontal axis: just the opposite of the graphs in the text. To assess normality, it really doesn't matter which quantities are plotted on which axes. However, in the text we have plotted the data quantiles on the horizontal axis in order to match up these values with the boxplot and histogram. To reverse the axes, move the cursor to the upper left corner of the box surrounding the normal quantile plot, press down on the left mouse button, and drag the left corner diagonally down through and past the lower right corner of the box. (Once again, the click-and-drag version of turning a sleeve inside out.) You can then resize and move the box as you want. By moving some other graphics boxes, you can line up the normal quantile plot below the histogram. Choosing Edit:Windows:Align will align the values of the horizontal variable (THICK, for the gasket data) in all three plots. To add a reference line to the normal quantile plot, choose Curves:QQ Ref Line.
Look at the normal curve fit to the histogram and at the normal quantile plot. How do they look? Those two outliers are really causing problems, aren't they? Remove the most extreme one as follows. Select the extreme outlier by clicking on it in the boxplot. Choose Edit:Observations:Exclude in Calculations. Notice that the normal curve and normal quantile plot are recalculated without the extreme outlier. Do you like this fit any better? Perhaps you should remove the other outlier now. Proceed as in the last paragraph. With the two outliers removed, the normal density fits the histogram well, and the normal quantile plot is nearly a straight line. (Note: to include the outliers in the calculations again, make sure they are selected and choose Edit:Observations:Include in Calculations).

Transformations

A selection of transformations is available in SAS/INSIGHT by choosing Edit:Variables.

Doing Lab 4.1 with SAS

To do lab 4.1, merely run the macro LAB4_1. Both the required density histogram and the plot of the cumulative proportion of values Y=1 versus trial will be automatically produced.

Doing Lab 4.2 with SAS

The macro LAB4_2 will produce the necessary histogram. You will be prompted for your values of N and n: choose n=5. Output from the macro LAB4_2 consists of a density histogram just like you produced for the 10 trials you conducted by hand, only for 10,000 trials. The relative frequency of each of 0---5 successes for the 10000 measurements will appear at the top of the corresponding bar.

Doing Lab 4.3 with SAS

First a word about the macros you will use in the simulations. When running the macro, don't worry if graphs pop up on the screen and disappear. They will reappear on a one-page template containing all four graphs that you called for. CAUTION: If you wish to print the template you must do it \underline BEFORE moving on to the next macro. Submitting a new macro will overwrite the previous template and you'll have to run the first macro again.

The Central Limit Theorem for Rolls of a Die

You are going to call the macro MAKEDATA. This macro will generate random data from the discrete uniform distribution having an equal probability of producing any of the integers 1,2,3,4,5 or 6, just like a fair die. The data will be put in the data set ROLLS. The macro will simulate the trial of rolling a fair die 50 times. It replicates this trial 250 times, producing a total of 250 times 50 or 12,500 simulated die rolls. MAKEDATA simulates the 50 rolls of the fair die, putting the result of the i-th roll in variable Ci. Thus each row in C1- C50 represents one replication of the trial. There are 250 such rows corresponding to the 250 replications of the trial. The macro also computes the means of the first 2, 10, 30 and 50 rolls from each trial, calling them MEAN2, MEAN10, MEAN30 and MEAN50, respectively. Run the macro now. A window will pop up informing you when the data set has been created. Click on the window as directed and hit return. The window will go away. Note that if the window fails to appear, something has gone wrong and you should ask for help. Use SAS or SAS/INSIGHT to look at ROLLS, which is a data set containing a portion of the data. Specifically, ROLLS has in each row the first 5 of the 50 original observations (die rolls) under variable names C1-C5. The mean of C1 and C2 is in MEAN2, the mean of C1-C10 is in MEAN10, the mean of C1-C30 is in MEAN30, and the mean of C1-C50 is in MEAN50.
You may use SAS or SAS/INSIGHT make a frequency and a density histogram of C1. Recall that in SAS/INSIGHT, Analyze:Histogram/Bar Chart ( Y ) will produce a frequency histogram, and Analyze:Distribution ( Y ) will produce a density histogram. To obtain density histograms of C1, MEAN2, MEAN10, and MEAN50 all plotted on the same scale, simply use the macro HISTREP. When you call HISTREP an input window will appear. Click on the green cursor and enter 'rolls' (without the quotes) as the data set name. Hit return and enter 'u', then successively the names C1, MEAN2 MEAN10 and MEAN50. Density histograms of these variables will appear in the SAS GRAPH window. You should print these now.
Make normal quantile plots of C1, MEAN2, MEAN10, and MEAN50. To do this, call the macro `NORMREP' and proceed as you did for HISTREP. Print these graphs now.
In SAS/INSIGHT, Analyze:Distribution ( Y ) will produce the means and standard deviations of C1, MEAN2, MEAN10, and MEAN50.
The macro SMEAN will compute the standardized means of C1, MEAN2, MEAN10, and MEAN50. These standardized means will be found in the data set ROLLS under the variable names SC1, SMEAN2, SMEAN10, and SMEAN50. Use SAS/INSIGHT to check that the means of each of these variables are nearly 0 and the standard deviations are nearly 1. Use the macro HISTREP to generate density histograms and NORMREP to generate normal plots of the standardized means (don't forget to enter an 's' to to denote the fact that the data are standardized).

An Example Where the Central Limit Theorem Fails

The macro MAKECAU will generate 250 data sets each of 50 observations from a Cauchy distribution model. The data will be placed in CAU. C1 again denotes the first column of data, and MEAN2, MEAN10 and MEAN50 have the same meaning here as they did in ROLLS. Now do steps 2. and 3. on these data; don't forget to enter a 'c' to denote the fact that the data are Cauchy.

Doing It with SAS: Chapter 5

Data Sets

SOL: One hundred measurements of the speed of light.
BEARINGS: Weights and diameters of 100 ball bearings.

Estimation Using SAS/INSIGHT

Before any inference procedure for measurement data, you should investigate the data for outliers and non-normality. SAS/INSIGHT is the easiest way to do this. SAS/INSIGHT will compute one sample t confidence intervals (equation (5.8)). To do this, first do a distribution analysis of the variable in question. From the distribution analysis window choose Tables: C.I. for Mean and then select the desired confidence level.

SAS Macros

Estimation

The macro TINT will compute one sample t confidence intervals (equation (5.8)). It will also compute the prediction interval given by equation (5.11).
The macro TWINT will compute two sample approximate t (equation (5.24)) confidence intervals for the difference of means in two independent C+E models.
The macro BIEXACT will compute exact one sample confidence intervals for population proportions.
The macro BINORM will compute one and two sample, large sample confidence intervals for population proportions.

Utilities

The macro INVPROBS will compute quantiles for the normal and t distributions.
The macro NPROBS will compute probabilities P(a < Y <= b) for the normal and t distributions.

Doing Lab 5.1 with SAS

The macro LAB5_1A will generate as many sets of data from the C+E model as you tell it to. A window will ask you for the number of data sets, the name of the SAS data file where you want the data sets written, the number of observations per data set, and the values of $\mu$ and $\sigma^2$ that define the C+E model.

Use the macro LAB5_1B to generate the 500 data sets each of size 20 from the same C+E model that you chose above, and then to draw histograms of the parameter estimates.

Doing Lab 5.2 with SAS

The macro LAB5_2 will create 100 samples from the C+E model and calculate level L confidence intervals for $\mu$ . A window will prompt you to input the number of data sets (choose 100), the number of observations in data set (choose 20), the values of the parameters $\mu$ and $\sigma^2$ (choose what you like) and the confidence level L (choose .95). The window will also prompt you for ``Contamination level'' (choose 0). The input window will display the mean width of the 100 intervals. A graph will display the true value of $\mu$ and the computed confidence intervals. The intervals that contain the true parameter value are displayed in green and the intervals that do not contain the true parameter value are displayed in red.

Run the macro using the same parameters as previously, but first with a 0.1 proportion of contamination and then with a 0.5 proportion of contamination.

Doing It with SAS: Chapter 6

One Sample Tests for the Mean in the C+E Model

A two-sided test can be obtained from SAS/INSIGHT. Choose Analyze: Distribution ( Y ) : Tables: Location Tests. From the resulting pop-up window, choose Student's T Test and for Parameter input the value of $\mu$ _0. Output consists of the value of the test statistic and the two-sided p-value. From this information, the p-value for either one-sided test can be computed. As an example, the t* for the one sample test of

H_0: $\mu$ = 275, H_a: $\mu$ > < 275. (where > < stands for "not equal".)

for the artificial pancreas data (see Section 6.3) is given in SAS/INSIGHT as -2.79 with p-value 0.068. Since t* < 0, we know that the area under the t_3 curve below t* is 0.068/2=0.034. This is the p-value for testing the one-sided alternative H_a: $\mu$ < 275. The p-value for testing the opposite one-sided alternative, H_a: $\mu$ > 275, is the area above t*, which is 1-0.034=0.966.

Comparing Two Means

The macro TWTEST will perform both the pooled and approximate one and two-sided t tests. It accepts as input either (1) data for the two samples as separate columns in a SAS data set, or (2) summary data consisting of the sample mean and standard deviation for each sample.

Tests for Proportions

The test statistics are easy enough to compute using pencil and paper. The macro NPROBS will compute the appropriate tail areas for the binomial (exact test) or normal (large sample approximation) distributions.

Doing Lab 6.1 with SAS

The instructions below are keyed to the instructions in the text.

The Meaning of Statistical Significance and p-values

1: Use the macro LAB6_1 to generate 1 set of 10 observations from a N(25,1) distribution. The macro will also compute the t statistic and the p-value for testing H_0: $\mu$ =25 versus H_a: $\mu$ > < 25 for this data set.
3: Now use the same macro to generate 1000 sets of 10 observations each from a N(25,1) distribution, and to compute the t statistic and p-values for each. The 1000 sets of t statistics and p-values will be saved in the SAS data set TEST25.
4: You can use SAS/INSIGHT with the data set TEST25 to obtain the proportion of the 1000 test statistics that provide as much evidence against the null and in favor of the alternative hypothesis as does t*, and the proportion of p-values as small as or smaller than the p-value associated with t*.

How Nonnormality Affects the Results

1: Generate a histogram of 1000 observations from the exponential distribution using the macro LAB6_1.
3: Use the macro LAB6_1 to generate 1 set of 10 observations from an exponential distribution with mean $\mu$ =25. The macro will also compute the t statistic and the p-value for testing H_0: $\mu$ =25 versus H_a: $\mu$ > < 25.
5: Now use the macro to generate 1000 sets of 10 observations each from an exponential distribution with mean 25, and to compute the t statistic and p-values for each. The 1000 sets of t statistics and p-values are saved in the SAS data set EXPO25. These 1000 data sets represent 1000 experiments identical to the original one.
6: Use SAS/INSIGHT to obtain the proportion of the 1000 test statistics that provide as much evidence against the null and in favor of the alternative hypothesis as does t*. Obtain the proportion of p-values as small as or smaller than the p-value associated with t*.

Doing It with SAS: Chapter 7

Data Sets

TWEAR: Tool wear data.
TWEAR8: Tool wear data for VELOCITY=800.
FUEL: Fuel consumption versus equivalence ratio.
DRAFTLOT: 1970 draft lottery data.
TRAPDATA: Bacterial trap data.
DONNER: Donner party data.
DERBY: Kentucky Derby data.

The Median Trace

The macro MTRACE will compute a median trace. An input window will appear; click on the cursor location. To do a median trace for the draft lottery data, the data set, Y variable, X variable and number of slices you should enter are DRAFTLOT, NUMBER, BDATE and 12 respectively. Next another input window window will appear asking for the upper boundary of the first slice. Tell it 31 for the 31 days in January (don't forget to click on the cursor first). The red window will reappear asking each time for the upper boundary of the next slice. Give it (let's see, thirty days hath September...) the values 60, 91, 121, 152, 182, 213, 244, 274, 305, 335 and 366 successively. You can experiment if you like with different boundaries for the slices and different numbers of slices.

The Tool Wear Data

To generate Figure 7.1, choose Analyze:Scatter Plot (Y X). From the resulting dialog window, select WEAR as the Y and TIME as the X variable. A scatterplot window will appear. Enlarge and renew this window for better viewing. To generate Figure 7.5, use the markers in SAS/INSIGHT (just as you did in Chapter 1) to give a different plot symbol to each value of VELOCITY on the WEAR versus TIME scatterplot. For viewing at the computer you may prefer to use the palettes to give different colors instead of different plotting symbols. Or you can do both. You can obtain the scatterplot in Figure 7.6 from the data set TWEAR8.

Correlation

It's easy to standardize variables in SAS/INSIGHT. To do it, from the data window choose Edit:Variables:Other.... From the resulting dialog window choose the transformation ``(Y-mean(Y))/std(Y)'' and whichever variable you want transformed. Try this now for the two variables WEAR and TIME in the data set with VELOCITY=800. Plot the standardized variables against each other. To find the correlation of the tool wear data for VELOCITY=800, access TWEAR8 and choose Analyze:Multivariate ( Y's ). From the resulting dialog window select TIME and WEAR and ORDER as the Y variables. A window will appear containing a number of descriptive statistics. The Correlation Matrix in that window contains Pearson correlations for all pairs of variables. On the diagonal are the correlations of each variable with itself (What are these? Does this surprise you?). The off-diagonals are the correlations between pairs of different variables. Which other variable is most correlated with WEAR? The correlation matrix is symmetric (i.e. the entries below the upper left to lower right diagonal are mirror image of those above the diagonal). Why do you think this is?

Regression

Least Squares Fit

It is very easy to compute the least squares estimators using SAS/INSIGHT: just choose Analyze:Fit ( Y X ), and select the X and Y variable from the dialog window. When you choose Analyze:Fit ( Y X ), SAS/INSIGHT automatically computes the fitted values and residuals and places them in the data set under the names P_Y and R_Y, respectively, where Y is the name of the Y variable. So, for example in the regression of WEAR on TIME, the fitted values are called P_WEAR and the residuals are called R_WEAR. A plot of residuals versus fitted values is also produced automatically. You can now plot the residuals versus any variables of interest.

Studentized Residuals and Normal Quantile Plots

Generate Studentized residuals by choosing Vars: Studentized Residual. The Studentized residuals will be placed in a variable named with the prefix RT_ followed by something resembling the name of the response variable in the regression. It is a good idea to look at the Studentized residuals. Choosing Analyze: Distribution ( Y ) will do a distribution analysis of the Studentized residuals The SAS macro TQPLOT will produce a plot of Studentized residuals versus t quantiles. It will also write the original data, the Studentized residuals and the t quantiles to a data set of your choice.

Confidence and Prediction Bands and Intervals

The confidence and prediction bands in Figure 7.21 were generated by choosing Curves: Confidence Curves: Mean: and Curves: Confidence Curves: Prediction:, respectively. You are allowed to choose the confidence level of the bands. The SAS macro REGPRED computes level .95 confidence intervals for the mean of the response and level .95 prediction intervals for a new observation at each data value in the input data set and at additional user-specified predictor values. The predicted values are stored under the name PRED. The endpoints of the confidence intervals for the mean are stored under names L95MPRED and U95MPRED and those for prediction intervals for a future observation are stored under the names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS regression output is written to the SAS/OUTPUT window.

Categorical Data

In SAS/INSIGHT you can analyze data for a single categorical variable using bar charts. You can obtain information on the relation between two categorical variables using mosaic plots. For example, Figure 7.23 was produced by choosing Box Plot/Mosaic Plot ( Y ) and then selecting GENDER as the Y variable and FATE as the X variable. The frequencies and percentages were added by choosing :Values. The SAS macro CAT2WAY will create two-way tables. Since it was designed with additional sophisticated analyses in mind, the input to and output from CAT2WAY contains some terms you will not be familiar with. Still, it is very easy to use, as the following example, based on the Donner data, shows. The following will produce one and two-way frequency tables for FATE and GENDER for the Donner data:

Invoke the macro CAT2WAY.
Enter the names of the data set (DONNER), row variable (FATE) and column variable (GENDER) where indicated.
You are next asked if there is a count variable. For the Donner data, there is not, so answer 'N'. Were the data set to have a variable giving cell counts, you would answer 'Y', and then be prompted to give the name of the count variable.
You are next asked if you want to conduct Fisher's exact test. As you don't know what this is, just answer 'n'.
When the computations are finished, you will be prompted to hit return to exit the macro. The table will be output to the SAS Output Window. Each cell of the table will contain the cell count or frequency, overall percent, row percent, column percent, expected frequency and the cell chi-square. The cell chi-square is just the square of the Pearson residual. A number of test statistics are also output, including Pearson's chi-square, which will appear thus:
```
  Statistic                     DF     Value        Prob
  ------------------------------------------------------
  Chi-Square                     1     4.811       0.028
```
In this output, the quantities shown are the degrees of freedom, the observed value of the chi-square test statistic, 4.811, and the p-value, 0.028.

Computing p-Values for the Chi-Square Distribution

The p-value for a chi square test is easily computed using the SAS macro NPROBS, remembering that a chi-square distribution with m degrees of freedom is a gamma distribution with parameters ALPHA=m/2 and BETA=2. For Example 7.11 about the categories of defective computers, we have an observed value 13.36 of the test statistic, and we want to compute its p-value using a chi-square distribution with 4 degrees of freedom as the reference. To do this, invoke the macro NPROBS and select the gamma distribution. Enter 2 (=4/2) for ALPHA and 2 for BETA. Enter 13.36 for A and some very large number (we used 10000) for B.

Proc FREQ

Proc FREQ can conduct Pearson's chi-square test, and other associated quantities. To illustrate its use, we consider data relating consumption of ascorbic acid (vitamin C) to the incidence of colds in a group of French skiers. In a controlled experiment, 279 French skiers were divided into a treatment and a control group. The treatment group received ascorbic acid and the control group a placebo. Whether or not the skier had a cold during the trial period was recorded. To enter the data, submit the following program from the SAS PROGRAM EDITOR window:
```
title 'Analysis of data on French skiers';
options linesize=70;
data skiers;
input treat \ cond \ count @@;
cards;
plac cold 31 plac ncold 109
asco cold 17 asco ncold 122
;
run;
```
The data are now in the SAS data set SKIERS. The following commands, submitted from the SAS PROGRAM EDITOR window, will, among other things,
- Create a table with
  - Cell counts
  - Overall, row and column percentages
  - Cell chi-squares (the square of the Pearson residuals) (cellchi2)
- Calculate the Pearson chi-square statistic and its p-value (chisq)
```
proc freq data=skiers order=data;
 weight count;
 tables treat*cond / chisq cellchi2;
run;
```
The output is the following:
```
                   Analysis of data on French skiers                 
                        TABLE OF TREAT BY COND
              TREAT           COND
              Frequency      |
              Cell Chi-Square|
              Percent        |
              Row Pct        |
              Col Pct        |cold    |ncold   |  Total
              ---------------+--------+--------+
              asco           |     17 |    122 |    139 
                             |  1.999 | 0.4154 |
                             |   6.09 |  43.73 |  49.82
                             |  12.23 |  87.77 |
                             |  35.42 |  52.81 |
              ---------------+--------+--------+
              plac           |     31 |    109 |    140
                             | 1.9847 | 0.4124 |
                             |  11.11 |  39.07 |  50.18
                             |  22.14 |  77.86 |
                             |  64.58 |  47.19 |
              ---------------+--------+--------+
              Total                48      231      279
                                17.20    82.80   100.00




                  Analysis of data on French skiers
                STATISTICS FOR TABLE OF TREAT BY COND
        Statistic                     DF     Value        Prob
        ------------------------------------------------------
        Chi-Square                     1     4.811       0.028
        Likelihood Ratio Chi-Square    1     4.872       0.027
        Continuity Adj. Chi-Square     1     4.141       0.042
        Mantel-Haenszel Chi-Square     1     4.794       0.029
        Fisher's Exact Test (Left)                       0.021
                            (Right)                      0.991
                            (2-Tail)                     0.038
        Phi Coefficient                     -0.131
        Contingency Coefficient              0.130
        Cramer's V                          -0.131

        Sample Size = 279
```
Doing Lab 7.1 with SAS

The instructions below are keyed to the instructions in the text.
- Access the macro LAB7_1. This macro will generate a data bivariate set, and will display a plot of the response versus regressor variable in the SAS graph window.
- A window called 'GUESS' will appear asking you to give an intercept and slope for a line you think best fits the data. Take a few minutes to formulate an educated guess before answering. Following instructions in the window, enter your guesses.
- The program will then plot your line superimposed on a plot of the data. How did you do? From the plot you should see how your fitted line can be improved. When you are done looking at the plot, click on the lower part of the vertical scroll bar on the right side of the graphics window. This causes a plot of the residuals from your guessed line versus X. The program also displays the SSE for the line you fit and gives you a residual plot. Mark the SSE down.
- Using the feedback from the data plots, try to improve your fit. A message will appear in the guess window asking you if you want to try again. Type 'G' to guess again or 'Q' to quit. Make another guess, submitting numbers as you did before. In terms of SSE and of the data plots how did you do? Be sure to look at both plots (you won't be able to continue until you do). Keep track of your best fit and keep trying until you think you've done as well as you can.
- When you want to see how close your guess is to the least-squares line type 'S' in the macro window when you are prompted. This will get you the least squares fits of slope and intercept and the minimum SSE. How does your best fit compare? If you run the least squares slope and intercept through the macro, you can compare the resulting plots with those from your best fit, and you can find the SSE for the least squares fit.
Doing Lab 7.2 with SAS

The instructions below are keyed to the instructions in the text.

2
Access the macro LAB7_2. In response to the prompts, input FUEL (or SASDATA.FUEL) as the data set, FADJ as the response and E_RATIO as the regressor. Hitting return will generate a scatterplot of the response versus the regressor with the least squares line superimposed. Clicking on the scroll bar in the graph window will show a plot of the Studentized residuals versus the regressor. The SAS Output Window contains the regression parameters, SSE, fits, residuals, etc. Note the value of the coefficient of determination (R-square). Describe the pattern of the residual plot.
3-4
In the data entry window, you will be prompted for the value p for the power transformation you want to apply. The macro will regress FADJ^p on E_RATIO and output the same plots and regression output as in 2. Try the two suggested values of p, and then keep trying more values until you find a nearly linear relationship.

Doing It with SAS: Chapter 8

Data Sets
- TREES: Volumes, heights and diameters of 31 black cherry trees, Example 8.1.
The Graphical Exploration of Multi-Variable Data

Scatterplot Arrays and Brushing

To create a scatterplot array for the tree data in SAS/INSIGHT:
- Choose Analyze: Scatter Plot ( Y X ).
- In the window that appears, select D, H and V, then click on the Y button to designate them as Y variables for the plots.
- Select these variables again, then click on the X button to designate them as X variables for the plots.
- Click on the OK button. The scatterplot array will appear.
To create a brush on the scatterplot array, or any other SAS/INSIGHT analysis window for that matter, position the cursor where you want one corner of the brush to appear and click and hold the left mouse button while moving the cursor toward where you want the diagonally opposite vertex of the rectangular brush to be. See Section 8.5 of the text for more on this. The brush may be moved by placing the cursor on one of the sides of the rectangle (away from a vertex), clicking the left mouse button and dragging. The shape of the brush may be changed by positioning the cursor at a vertex, clicking the left mouse button and dragging.

3-D Plots

To create a rotating 3-D plot in SAS/INSIGHT, choose Analyze:Rotating Plot ( Z Y X ). Do this now for the tree data. From the resulting window choose V as the Z variable, H as the Y variable and D as the X variable. A graph window will appear. Choose Edit: Windows: Tools to bring up the SAS Tools window. Click on the hand in the Tools window and move it to the graph window. The hand tool can be used in a variety of ways to rotate the plot:
- By clicking and releasing the left mouse button, you will rotate the plot a small amount.
- By clicking and holding down the left mouse button, you will rotate the plot continuously. The closer the hand tool is to the origin, the slower the rotation.
- You can rotate a particular axis by putting the hand tool on the end of that axis and clicking and holding down the left mouse button while moving the mouse. For example, put the hand tool on the letter ``D'' at the end of the D axis. Then click and hold down the left mouse button and move the mouse. The D axis will follow your movements.
- If you get the plot rotating by moving the mouse with the left mouse button down, and then release the button, the plot will continue to rotate.
Try some of these movements to get different views of the 3-D plot. You can also use the buttons on the left side of the 3-D graph window to control the direction and speed of movement. With a little practice, you will become adept at using these 3-D plots.

Fitting Models (8.22) and (8.23)

The simplest way to fit model (8.22) using SAS/INSIGHT is:
1. Choose Analyze: Fit ( Y X ).
2. In the resulting dialog box, choose V as the Y variable and D and H as the X variables.
3. Select D and H and click on the ``Cross'' button to include the product term, D*H.
4. Click on ``Run'' to obtain the output.
To fit the additive model (8.23), follow steps 1, 2 and 4.

Centering Predictors

To avoid the computational and statistical difficulties associated with multicollinearity, we might want to center both D and H by subtracting the mean of the tree diameters from each tree's diameter and the mean of the tree heights from each tree's height. This has already been done in this data set with the variables CD and CH being the centered variables. The following two steps show how SAS/INSIGHT can be used to center the predictors D and H:
1. First, find the mean of D and H by choosing Analyze : Distribution ( Y ). You will obtain means of 13.25 and 76 for D and H, respectively.
2. Next, choose Edit: Variables: Other. In the resulting dialog box choose the variable you wish to center, and under ``Transformation:'' choose a+b*Y. For a enter the negative of the mean and for b enter ``1''. Thus, for D, a=-13.25. Finally, enter a name for the centered variable. Click on ``OK''.
Studentized Residuals

To generate the Studentized residuals in SAS/INSIGHT, choose Vars: Studentized Residual. The Studentized residuals will be placed in a variable named with the prefix RT_ followed by something resembling the name of the response variable in the regression (exactly what depends on what else you have done previously in the SAS/INSIGHT session). For example, I just computed the Studentized residuals for the fit of model (8.22) and they were placed in the variable RT_VOL_8. If you don't like the name SAS assigns, you can change it by choosing : Define Variables in the data window. Once you have generated the Studentized residuals, you can obtain a normal quantile plot by choosing Analyze: Distribution ( Y ) to do a distribution analysis of the Studentized residuals, and from the Distribution Analysis window choosing Graphs: QQ Plot. Make sure the normal distribution has been chosen in the resulting pop-up window (you may ignore the selections under ``Parameters:''.) To put the 45 degree reference line (the correct reference line when using Studentized residuals) on the normal quantile plot, choose Curves: QQ Ref Line, and then from the resulting pop-up window select Specification and specify 0 for the intercept and 1 for the slope. A more appropriate plot than a normal quantile plot of Studentized residuals is a plot versus t quantiles. The SAS macro TQPLOT will construct this plot for you. After asking for the name of the data set and response variable, the macro will ask if you want a regression fit, as opposed to a GLM (General Linear Model) fit. Answer ``y'' (without the quotes). You must then input the names of the regressor variables, separated by spaces. For the TREES data, you might specify the regressors CD CH CD*CH. In addition to producing the quantile plot, the macro computes and outputs the original data, the Studentized residuals, regular residuals, fitted values and t quantiles to a SAS data set of your choice. From there you can plot and analyze them further.

Confidence and Prediction Intervals

The SAS macro REGPRED computes level 0.95 confidence intervals for the mean of the response and level 0.95 prediction intervals for a new observation at each data value in the input data set and at additional user-specified predictor values. The predicted values are stored under the name PRED. The endpoints of the confidence intervals for the mean are stored under names L95MPRED and U95MPRED and those for prediction intervals for a future observation are stored under the names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS regression output is written to the SAS/OUTPUT window. As an example, suppose we want to use model (8.22) and the tree data to obtain intervals for the mean volume and to predict the volume of a new tree having diameter 10 inches and height 70 feet. When REGPRED asks 'ENTER THE NAME(S) OF THE PREDICTOR(S)', the response is D H, and when REGPRED asks 'ENTER THE NAME(S) OF THE REGRESSOR(S)', the response is D H D*H. When REGPRED asks 'WOULD YOU LIKE TO SPECIFY ADDITIONAL VALUES OF THE PREDICTORS AT WHICH TO COMPUTE PREDICTION INTERVALS?', answer y, and when prompted put in the values 10 70.

Backward Elimination

SAS/INSIGHT offers a particularly easy way to remove one variable at a time from a fitted regression model. As an example, suppose that you have fit the model for the tree data with regressors CD, CH and CD*CH, and that you want to remove CD*CH. To do so, return to the gray Fit(YX) window you used to fit the present model, click on CD*CH in the window containing the regressor names, and then on the ``Remove'' button in the lower right corner. CD*CH will be removed as a regressor. Now click on ``Run'' and the new model will be fit.

Doing Lab 8.1 with SAS

The instructions below are keyed to instructions in the text.

Experimental Procedure

Data Generation To generate the data sets as in 1-3, invoke the SAS macro LAB8_1. You will be prompted for the name of the SAS data set to contain the data, the number of observations, the parameters of the model, and the desired correlation between the predictor variables. All quantities except the first and last remain the same for all three data sets.
Analysis

1
The Pearson correlation between the regressors may be calculated in SAS/INSIGHT using Analyze: Multivariate (Y's).
2-3
All required output is obtained in SAS/INSIGHT using Analyze: Fit ( Y X ).

Doing Lab 8.2 with SAS

Experimental Procedure

The instructions below are keyed to instructions in the text.
Data Generation To generate the data set, invoke the SAS macro LAB8_1. You will be prompted for the name of the SAS data set to contain the data, the number of observations, the parameters of the model, and the desired correlation between the regressors. This last is of interest for Lab8-1 only, so here just set the correlation to 0.5 and name the data set SET50.
Analysis
Look at the Data.

1-2
Use SAS/INSIGHT to plot X1 versus X2 and to regress Y on X1 and X2.
3
Generate Studentized residuals. (Recall that you do this by clicking on Vars: Studentized Residual. The Studentized residuals will be put into the data set under the name RT_ response: In the SET50 data set the name would be RT_Y).
4
Generate a t quantile plot of the Studentized residuals by running the SAS macro TQPLOT. Now look at the plot. Are any major problems evident?

Create an Outlier and See What Happens. To change a data value in SAS/INSIGHT, click on the cell in the data window containing the value, type in the new value and hit the return key. The new value will now replace the old one. In addition, all plots and summary measures in SAS/INSIGHT that are associated with this value will automatically be updated for the new value. In particular, the regression fit, the plot of the Studentized residuals versus the fitted values and the associated measures, such as R^2, will all be updated. The t quantile plot will not be updated, however, so you will have to recreate this plot by first making a copy of the revised data set and then calling the macro TQPLOT.
Doing It with SAS: Chapter 9

Data Sets
- PROSTATE: Efficacy of different treatments on benign prostate hyperplasia, Example 9.1.
- WATCHES: Watch assembly times, Example 9.3\\
- EG1_5A: Data for Lab 9_1.
- LAB9_1: Data for Lab 9_1.
Mean Diamonds

Mean diamonds may be produced as follows:
- First produce side-by-side boxplots of the response for data from the different populations. In Example 9.1, there would be three boxplots: one each for the drug, microwave and treatment groups.
- Create mean diamonds on the boxplots by selecting : Means from the window displaying the boxplot.
- You can leave the display as it is, or you can remove the boxplots. To do so, choose : Observations.
Model Fitting in SAS/INSIGHT

SASDATA.PROSTATE contains a response variable (DELTAFLO) and a classification variable (TREATMNT). The classification variable is nominal, taking the values drug, microwav and surgery. SASDATA.WATCHES contains a response variable (TIME) and two classification variables (WORKER and METHOD). In SASDATA.WATCHES, the classification variables WORKER and METHOD are also nominal variables, even though they take on the values 1, 2, 3, 4, 5, and 1, 2, 3 respectively. In order to use SAS/INSIGHT to fit the models studied in this chapter, the classification variables must be nominal. If you are using a data set in which the classification variables are interval, you may change them to nominal by selecting : Define Variables..., and resetting the measurement level to nominal in the resulting dialog box. This may also be done by clicking on the word ``Int'' above the variable name in the data window. To fit the model, select Analyze: Fit ( Y X ) and choose the response as the Y variable and the classification variable(s) as the X variable(s). Output will include an ANOVA table. Residuals and fitted values will be computed and placed in the data window.

Model Checking

Studentized residuals can be computed from the fit window by choosing Vars: Studentized Residual. Residual plots may be obtained in the usual way in SAS/INSIGHT by plotting residuals or Studentized residuals against any variable of interest. You can also produce a normal quantile plot of the Studentized residuals. Do this by performing a distribution analysis of the Studentized residuals (choose Analyze: Distribution ( Y ) and from the Distribution Analysis window choose Graphs: QQ Plot). Make sure the normal distribution has been chosen in the resulting pop-up window (you may ignore the selections under ``Parameters:''.) To put the 45 degree reference line (the correct reference line when using Studentized residuals) on the normal quantile plot, choose Curves: QQ Ref Line, and then from the resulting pop-up window select Specification and specify 0 for the intercept and 1 for the slope. A more appropriate plot than a normal quantile plot of Studentized residuals is a plot versus t quantiles. The SAS macro TQPLOT will construct this plot for you. After asking for the name of the data set and response variable, the macro will ask if you want a regression fit, as opposed to a GLM (General Linear Model) fit. Answer ``n'' (without the quotes). You must the input the name of the classification variable (called class variable in the input window), and the name of the effect, which for the one-way model is the same as the class variable. For the prostate data both the class and effect entries will be TREATMNT. For the RCB model, there are two class variables, corresponding to the blocks and treatments. These are the also the effects. So, for the watches data, input the string WORKER METHOD as both class and effects variables. In addition to producing the quantile plot, TQPLOT computes and outputs the original data, the Studentized residuals, regular residuals, fitted values and t quantiles to a SAS data set of your choice. The macro RCBD will produce interaction plots and perform Tukey's test for the RCB model to check the assumption of additivity.

Individual and Multiple Comparisons

Individual and Bonferroni and Tukey multiple comparisons can be obtained from the SAS macros ONEWAY (for the non-blocked one-way model) and RCBD. The output will appear in the SAS OUTPUT window. The Tukey multiple comparison output will look like the output in Table 9.4 of the text. The output for Bonferroni multiple comparisons and for individual comparisons will resemble the output in Table 9.4, but will be labeled ``Bonferroni (Dunn) T tests ...'', and ``T tests (LSD) ...'', respectively, rather than ``T tests (TUKEY) ...''.

Doing Lab 9.2 with SAS

The following sections correspond to items 1-3 of the lab description in the text.
1. The SAS macro LAB9_2A will generate data sets from the one-way model with five populations having means 5, 2, 2, 2, and 2 and common variance 1. The data sets all have equal sample sizes of five from each population. A window will ask you for the number of data sets you want generated and the name of the SAS data file where you want the data sets written. Use this macro now to generate three data sets each with five observations per population. The response variables will have names Y1, Y2 and Y3. The variable denoting the population will have name POP.
2. Use the SAS macro ONEWAY to compute individual (LSD) and multiple (TUKEY) comparisons for all three data sets. (Note: the treatments requested by the macro are the populations you generated; the variable name for them is POP.) Take the confidence level to be 0.95. For the individual comparisons count the number of the three data sets in which there is at least one mistaken conclusion (i.e. an interval which does not contain the true mean difference). Record the result. Now do the same for the Tukey multiple comparisons.
3. The SAS macro LAB9_2B does exactly what you did in generating the three sets of data from the one-way model, computing individual and Tukey multiple comparison confidence intervals and checking to see for each type of comparison how many of the data sets have at least one mistaken conclusion. The only difference is that the macro will do all this for any number of data sets (not just three), and will do it all much faster than you can. You need only input the number of data sets you want generated. The output is the number of those data sets which contain at least one mistaken conclusion. Run this macro now for 1000, 10000 and 100000 data sets. What results do you observe?

Data Sets

Stationarity of Processes

Figures 1.1 and 1.3 of the Text

Figures 1.4, 1.5 and 1.6 of the Text

Causes of Variation

Example 1.2: A Stationary Process that Wasn't

Stratified Plots

Identifying Possible Causes of Variation

Creating a New Ishikawa Diagram

Saving a Diagram

Retrieving a Diagram

Printing a Diagram

Data Sets

Histograms and Bar Charts

Boxplots

Numerical Summaries

Resistant Measures

Doing Lab 2.1 with SAS

SAS/INSIGHT

From the SAS Command Line

Data Set

Randomly Assigning Treatments to Experimental Units

From SAS/INSIGHT

From the SAS Command Line

Doing Lab 3.2 with SAS

Data Sets

Computing Probabilities

Fitting a Normal Distribution

Transformations

Doing Lab 4.1 with SAS

Doing Lab 4.2 with SAS

Doing Lab 4.3 with SAS

The Central Limit Theorem for Rolls of a Die

An Example Where the Central Limit Theorem Fails

Data Sets

Estimation Using SAS/INSIGHT

SAS Macros

Estimation

Utilities

Doing Lab 5.1 with SAS

Doing Lab 5.2 with SAS

One Sample Tests for the Mean in the C+E Model

Comparing Two Means

Tests for Proportions

Doing Lab 6.1 with SAS

The Meaning of Statistical Significance and p-values

How Nonnormality Affects the Results

Data Sets

The Median Trace

The Tool Wear Data

Correlation

Regression

Least Squares Fit

Studentized Residuals and Normal Quantile Plots

Confidence and Prediction Bands and Intervals

Categorical Data

Computing p-Values for the Chi-Square Distribution

Proc FREQ

Doing Lab 7.1 with SAS

Doing Lab 7.2 with SAS

Data Sets

The Graphical Exploration of Multi-Variable Data

Scatterplot Arrays and Brushing

3-D Plots

Fitting Models (8.22) and (8.23)

Centering Predictors

Studentized Residuals

Confidence and Prediction Intervals

Backward Elimination

Doing Lab 8.1 with SAS

Experimental Procedure

Doing Lab 8.2 with SAS

Experimental Procedure

Data Sets

Mean Diamonds

Model Fitting in SAS/INSIGHT

Model Checking

Individual and Multiple Comparisons

Doing Lab 9.2 with SAS

Data Sets