Important data sets for Chapter 1 are:
Figures 1.1 and 1.3 were produced with SAS/INSIGHT. Figure 1.1 was created by choosing Analyze:Histogram/Bar Chart ( Y ) and then selecting DKWH from the resulting dialog window. Figure 1.3 was produced by first selecting Analyze:Scatterplot( Y X ) and then choosing DKWH as the Y variable and DATE as the X variable in the dialog window. This produced the scatterplot. Producing the corresponding histogram was a little trickier. First we created a rectangle to the right of the scatterplot by clicking there with the left mouse button and dragging. It doesn't matter how large the rectangle is. We next put a vertical bar chart there by choosing Analyze:Histogram/Bar Chart( Y ) and then selecting KWH from the resulting dialog window. To make the bar chart horizontal (this is the neat part), we clicked on the upper left corner and dragged that corner down past the lower right. (It's the click-and-drag version of turning a sleeve inside out.) We then moved the rectangle next to the scatterplot and resized it as desired. To align the KWH axes on both plots, we chose Edit:Windows:Align.
Figure 1.4 was produced by the macro TSPLOT, and Figures 1.5 and 1.6 were produced by the macro TSMAPRED. Input to TSMAPRED will include (in order)
All the plots in this section were created using SAS/INSIGHT .
The data for Figure 1.8 of the text are in the SAS data set WASHER5. To create Figure~1.8 select Scatterplot ( Y X ). In the resulting dialog box, choose THICK as the Y variable, ORDER as the X variable and MACHINE as the Group variable. The result is the three plots you see, but aligned horizontally rather than vertically. In addition, the vertical axes of the plots differ. To get the vertical axes to line up, on the graph window select Edit:Windows:Align. Use clicking (on the bounding box of the plots) and dragging to place the graphs in the vertical configuration shown. Figure 1.9 was done in exactly the same way using the data in WASHER7.
While you can draw effective Ishikawa diagrams by hand, presentation-quality diagrams are easily drawn using SAS as follows:
You may want to save a diagram for later use. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``File''. A ``File Requestor'' window (for selecting where to save the diagram) will appear. You must first select a library in which to save the diagram. If you want to save it temporarily (it will disappear after you exit SAS), select the library ``WORK''. If you want it to be there for future SAS sessions, select the library ``SASUSER''. Next select a name for the data set (your choice, 8 or fewer characters), and click on ``OK''.
To retrieve a saved Ishikawa diagram from the Ishikawa window, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Open''. A ``File Requestor'' window will appear. This window is identical to the one you used to save the diagram. When you select the file, a new Ishikawa window with the selected diagram will appear. To retrieve a saved Ishikawa diagram from the Statistical Quality Control window in SAS, click on the ISHIKAWA icon, then from the resulting window select ``Edit an Existing Ishikawa Diagram''. A ``File Requestor'' window will appear. Choose the saved diagram you desire, and a window will appear with the saved diagram in it. Some of the finer detail may be missing, however. To restore it, click anywhere on the diagram with the right mouse button and select ``> Detail''.
You may want to print your Ishikawa diagram. To do this, you must first save the diagram to a graphics catalog. To do this, click on ``File'' on the action bar at the top left of the Ishikawa window, then on ``Save as'' and ``Graph''. An ``Output Manager'' window (for selecting where to save the diagram) will appear. This window will have a default library and file name already showing (probably WORK.GSEG.ISHIKAWA). If you want to go with this name (and we'll assume here that you do), click on the ``OK'' button. The Ishikawa diagram will appear in a regular graphics window. It can be printed from there in the usual way of printing all graphics output.
Frequency histograms and bar charts are obtained in SAS/INSIGHT using the command Analyze:Histogram/Bar Chart ( Y ).
You can easily generate boxplots in SAS/INSIGHT by choosing Analyze:Box Plot/Mosaic Plot ( Y ) . For example, the side-by-side boxplots shown in Figure 2.13 of the text compare the salaries of men and women in the TECHSAL data set. They were produced by selecting SALARY as the Y variable and GENDER as the X variable. You can add information to the boxplots. Choosing :means will add a diamond-shaped figure with the mean indicated by a horizontal line and a span of +- two standard deviations. Choosing :Serifs will add serifs: little cross lines at the ends of the whiskers. Choosing :Values will put the values of the medians, quartiles and ends of whiskers on the graph. If the mean diamonds are chosen, the values of the means will also be displayed. Try these features yourself with the TECHSAL data.
The command Analyze:Distribution ( Y ) will produce numerical summaries such as the mean, median and standard deviation. It will also produce two plots: a boxplot and a density histogram. Density histograms are like frequency histograms, except that the height of each bar equals the density, rather than the frequency, of data in that bar's subinterval. The density in a subinterval is the frequency in the subinterval divided by the product of the number of observations in the data set and the subinterval width. You will learn more about density histograms in Chapter 4.
SAS/INSIGHT allows you to select from among a resistant estimator of the standard deviation (Gini's mean difference), and the two resistant estimators of location discussed above: the trimmed mean and the Winsorized mean. For the latter two you can choose the number of observations or the percentage of observations to be trimmed or Winsorized at each end. To compute these estimators, you must first generate the distribution window by choosing Analyze:Distribution ( Y ). From the menu bar on this window, click on Tables, and then select the resistant estimator of your choice.
The instructions below are numbered to correspond to the step numbers in the Experimental Procedure section of Lab 2.1. There are two versions of the instructions: the first for SAS/INSIGHT users and the second for for input of instructions from the command line.
The commands
proc univariate data=crime plot; var auto; run;will get all the output you need, except for the trimmed mean, which is unavailable from the SAS command line. The histogram produced will be a stem-and-leaf plot, in which data values serve as the histogram bars.
To see how to use SAS/INSIGHT to randomly assign treatments to experimental units, consider again the example of watch assemblers and assembly methods from Example 3.9.
The following commands will produce two columns of numbers in the output window:
data assign; do assemblr=1 to 15; rannum=ranuni(-1); output; end; run; proc sort data=assign out=assign; by rannum; proc print; var assemblr; run;Assign the first 5 of the assemblr numbers to assembly method 1, the next 5 to assembly method 2 and the last 5 to assembly method 3.
In SAS/INSIGHT, you can label the observations in the scatterplot of PRESS versus STUDENT by selecting HAND as the label variable in the SAS Scatterplot ( Y X ) dialog window. Then, clicking on each point on the resulting plot will label the point. You can also label the points for right and left with different colors or symbols. To do this, select Edit: Windows: Tools. The SAS:Tools window will appear. To give the two hands different colors, click on the long color button at the bottom of the color pallette. A ``SAS: Color Observations'' window will appear. Click on HAND, and then on OK. To get different plotting symbols for the two hands, do the same steps, beginning with a click on the long button with all the symbols on it.
You can use the macro NPROBS to compute the probability P(a < Y <= b) where the random variable Y has any of the following distributions studied in this section: binomial, Poisson, normal or Weibull. A data entry window prompts you for the name of the distribution and its parameters. You are also prompted for the values of a and b. To obtain P(Y <= b for the normal distribution, select a=-9999999. To obtain P(Y <= b) for the binomial, Poisson or Weibull distributions, select a=-1 (or any other negative value). To obtain P(Y > a) for the b(n,p) distribution, select b=n. To obtain P(Y > a) for the Poisson, normal or Weibull distributions, select b=99999999.
Here is a sequence of steps a data analyst might use in analyzing the gasket data in Example 4.23.
A selection of transformations is available in SAS/INSIGHT by choosing Edit:Variables.
To do lab 4.1, merely run the macro LAB4_1. Both the required density histogram and the plot of the cumulative proportion of values Y=1 versus trial will be automatically produced.
The macro LAB4_2 will produce the necessary histogram. You will be prompted for your values of N and n: choose n=5. Output from the macro LAB4_2 consists of a density histogram just like you produced for the 10 trials you conducted by hand, only for 10,000 trials. The relative frequency of each of 0---5 successes for the 10000 measurements will appear at the top of the corresponding bar.
First a word about the macros you will use in the simulations. When running the macro, don't worry if graphs pop up on the screen and disappear. They will reappear on a one-page template containing all four graphs that you called for. CAUTION: If you wish to print the template you must do it \underline BEFORE moving on to the next macro. Submitting a new macro will overwrite the previous template and you'll have to run the first macro again.
The macro MAKECAU will generate 250 data sets each of 50 observations from a Cauchy distribution model. The data will be placed in CAU. C1 again denotes the first column of data, and MEAN2, MEAN10 and MEAN50 have the same meaning here as they did in ROLLS. Now do steps 2. and 3. on these data; don't forget to enter a 'c' to denote the fact that the data are Cauchy.
Before any inference procedure for measurement data, you should investigate the data for outliers and non-normality. SAS/INSIGHT is the easiest way to do this. SAS/INSIGHT will compute one sample t confidence intervals (equation (5.8)). To do this, first do a distribution analysis of the variable in question. From the distribution analysis window choose Tables: C.I. for Mean and then select the desired confidence level.
A two-sided test can be obtained from SAS/INSIGHT. After opening a distribution analysis (Analyze: Distribution ( Y ) ), select Tables: Tests for Location. In the resulting pop-up window, input the value of _0. Output consists of the value of the test statistic and the two-sided p-value for three different tests: we are interested in the Student's t test (the other two are covered in Chapter 11). From this information, the p-value for either one-sided test can be computed. As an example, the t* for the one sample test of
H_0: = 275, H_a: > < 275. (where > < stands for "not equal".)
for the artificial pancreas data (see Section 6.3) is given in SAS/INSIGHT as -2.79 with p-value 0.068. Since t* < 0, we know that the area under the t_3 curve below t* is 0.068/2=0.034. This is the p-value for testing the one-sided alternative H_a: < 275. The p-value for testing the opposite one-sided alternative, H_a: > 275, is the area above t*, which is 1-0.034=0.966.
The macro TWTEST will perform both the pooled and approximate one and two-sided t tests. It accepts as input either (1) data for the two samples as separate columns in a SAS data set, or (2) summary data consisting of the sample mean and standard deviation for each sample.
The test statistics are easy enough to compute using pencil and paper. The macro NPROBS will compute the appropriate tail areas for the binomial (exact test) or normal (large sample approximation) distributions.
The instructions below are keyed to the instructions in the text.
The macro MTRACE will compute a median trace. An input window will appear; click on the cursor location. To do a median trace for the draft lottery data, the data set, Y variable, X variable and number of slices you should enter are DRAFTLOT, NUMBER, BDATE and 12 respectively. Next another input window window will appear asking for the upper boundary of the first slice. Tell it 31 for the 31 days in January (don't forget to click on the cursor first). The red window will reappear asking each time for the upper boundary of the next slice. Give it (let's see, thirty days hath September...) the values 60, 91, 121, 152, 182, 213, 244, 274, 305, 335 and 366 successively. You can experiment if you like with different boundaries for the slices and different numbers of slices.
To generate Figure 7.1, choose Analyze:Scatter Plot (Y X). From the resulting dialog window, select WEAR as the Y and TIME as the X variable. A scatterplot window will appear. Enlarge and renew this window for better viewing. To generate Figure 7.5, use the markers in SAS/INSIGHT (just as you did in Chapter 1) to give a different plot symbol to each value of VELOCITY on the WEAR versus TIME scatterplot. For viewing at the computer you may prefer to use the palettes to give different colors instead of different plotting symbols. Or you can do both. You can obtain the scatterplot in Figure 7.6 from the data set TWEAR8.
It's easy to standardize variables in SAS/INSIGHT. To do it, from the data window choose Edit:Variables:Other.... From the resulting dialog window choose the transformation ``(Y-mean(Y))/std(Y)'' and whichever variable you want transformed. Try this now for the two variables WEAR and TIME in the data set with VELOCITY=800. Plot the standardized variables against each other. To find the correlation of the tool wear data for VELOCITY=800, access TWEAR8 and choose Analyze:Multivariate ( Y's ). From the resulting dialog window select TIME and WEAR and ORDER as the Y variables. A window will appear containing a number of descriptive statistics. The Correlation Matrix in that window contains Pearson correlations for all pairs of variables. On the diagonal are the correlations of each variable with itself (What are these? Does this surprise you?). The off-diagonals are the correlations between pairs of different variables. Which other variable is most correlated with WEAR? The correlation matrix is symmetric (i.e. the entries below the upper left to lower right diagonal are mirror image of those above the diagonal). Why do you think this is?
The macro CORR will compute the Pearson correlation and a confidence interval for the population correlation.
It is very easy to compute the least squares estimators using SAS/INSIGHT: just choose Analyze:Fit ( Y X ), and select the X and Y variable from the dialog window. When you choose Analyze:Fit ( Y X ), SAS/INSIGHT automatically computes the fitted values and residuals and places them in the data set under the names P_Y and R_Y, respectively, where Y is the name of the Y variable. So, for example in the regression of WEAR on TIME, the fitted values are called P_WEAR and the residuals are called R_WEAR. A plot of residuals versus fitted values is also produced automatically. You can now plot the residuals versus any variables of interest.
Generate Studentized residuals by choosing Vars: Studentized Residual. The Studentized residuals will be placed in a variable named with the prefix RT_ followed by something resembling the name of the response variable in the regression. It is a good idea to look at the Studentized residuals. Choosing Analyze: Distribution ( Y ) will do a distribution analysis of the Studentized residuals The SAS macro TQPLOT will produce a plot of Studentized residuals versus t quantiles. It will also write the original data, the Studentized residuals and the t quantiles to a data set of your choice.
The confidence and prediction bands in Figure 7.21 were generated by choosing Curves: Confidence Curves: Mean: and Curves: Confidence Curves: Prediction:, respectively. You are allowed to choose the confidence level of the bands. The SAS macro REGPRED computes level .95 confidence intervals for the mean of the response and level .95 prediction intervals for a new observation at each data value in the input data set and at additional user-specified predictor values. The predicted values are stored under the name PRED. The endpoints of the confidence intervals for the mean are stored under names L95MPRED and U95MPRED and those for prediction intervals for a future observation are stored under the names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS regression output is written to the SAS/OUTPUT window.
In SAS/INSIGHT you can analyze data for a single categorical variable using bar charts. You can obtain information on the relation between two categorical variables using mosaic plots. For example, Figure 7.23 was produced by choosing Box Plot/Mosaic Plot ( Y ) and then selecting GENDER as the Y variable and FATE as the X variable. The frequencies and percentages were added by choosing :Values. The SAS macro CAT2WAY will create two-way tables. Since it was designed with additional sophisticated analyses in mind, the input to and output from CAT2WAY contains some terms you will not be familiar with. Still, it is very easy to use, as the following example, based on the Donner data, shows. The following will produce one and two-way frequency tables for FATE and GENDER for the Donner data:
Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 4.811 0.028In this output, the quantities shown are the degrees of freedom, the observed value of the chi-square test statistic, 4.811, and the p-value, 0.028.
The p-value for a chi square test is easily computed using the SAS macro NPROBS, remembering that a chi-square distribution with m degrees of freedom is a gamma distribution with parameters ALPHA=m/2 and BETA=2. For Example 7.11 about the categories of defective computers, we have an observed value 13.36 of the test statistic, and we want to compute its p-value using a chi-square distribution with 4 degrees of freedom as the reference. To do this, invoke the macro NPROBS and select the gamma distribution. Enter 2 (=4/2) for ALPHA and 2 for BETA. Enter 13.36 for A and some very large number (we used 10000) for B.
Proc FREQ can conduct Pearson's chi-square test, and other associated quantities. To illustrate its use, we consider data relating consumption of ascorbic acid (vitamin C) to the incidence of colds in a group of French skiers. In a controlled experiment, 279 French skiers were divided into a treatment and a control group. The treatment group received ascorbic acid and the control group a placebo. Whether or not the skier had a cold during the trial period was recorded. To enter the data, submit the following program from the SAS PROGRAM EDITOR window:
title 'Analysis of data on French skiers'; options linesize=70; data skiers; input treat $ cond $ count @@; cards; plac cold 31 plac ncold 109 asco cold 17 asco ncold 122 ; run;The data are now in the SAS data set SKIERS. The following commands, submitted from the SAS PROGRAM EDITOR window, will, among other things,
proc freq data=skiers order=data; weight count; tables treat*cond / chisq cellchi2; run;The output is the following:
Analysis of data on French skiers TABLE OF TREAT BY COND TREAT COND Frequency | Cell Chi-Square| Percent | Row Pct | Col Pct |cold |ncold | Total ---------------+--------+--------+ asco | 17 | 122 | 139 | 1.999 | 0.4154 | | 6.09 | 43.73 | 49.82 | 12.23 | 87.77 | | 35.42 | 52.81 | ---------------+--------+--------+ plac | 31 | 109 | 140 | 1.9847 | 0.4124 | | 11.11 | 39.07 | 50.18 | 22.14 | 77.86 | | 64.58 | 47.19 | ---------------+--------+--------+ Total 48 231 279 17.20 82.80 100.00 Analysis of data on French skiers STATISTICS FOR TABLE OF TREAT BY COND Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 4.811 0.028 Likelihood Ratio Chi-Square 1 4.872 0.027 Continuity Adj. Chi-Square 1 4.141 0.042 Mantel-Haenszel Chi-Square 1 4.794 0.029 Fisher's Exact Test (Left) 0.021 (Right) 0.991 (2-Tail) 0.038 Phi Coefficient -0.131 Contingency Coefficient 0.130 Cramer's V -0.131 Sample Size = 279
The instructions below are keyed to the instructions in the text.
The instructions below are keyed to the instructions in the text.
To create a scatterplot array for the tree data in SAS/INSIGHT:
To create a rotating 3-D plot in SAS/INSIGHT, choose Analyze:Rotating Plot ( Z Y X ). Do this now for the tree data. From the resulting window choose V as the Z variable, H as the Y variable and D as the X variable. A graph window will appear. Choose Edit: Windows: Tools to bring up the SAS Tools window. Click on the hand in the Tools window and move it to the graph window. The hand tool can be used in a variety of ways to rotate the plot:
The simplest way to fit model (8.22) using SAS/INSIGHT is:
To avoid the computational and statistical difficulties associated with multicollinearity, we might want to center both D and H by subtracting the mean of the tree diameters from each tree's diameter and the mean of the tree heights from each tree's height. This has already been done in this data set with the variables CD and CH being the centered variables. The following two steps show how SAS/INSIGHT can be used to center the predictors D and H:
To generate the Studentized residuals in SAS/INSIGHT, choose Vars: Studentized Residual. The Studentized residuals will be placed in a variable named with the prefix RT_ followed by something resembling the name of the response variable in the regression (exactly what depends on what else you have done previously in the SAS/INSIGHT session). For example, I just computed the Studentized residuals for the fit of model (8.22) and they were placed in the variable RT_VOL_8. If you don't like the name SAS assigns, you can change it by choosing : Define Variables in the data window. Once you have generated the Studentized residuals, you can obtain a normal quantile plot by choosing Analyze: Distribution ( Y ) to do a distribution analysis of the Studentized residuals, and from the Distribution Analysis window choosing Graphs: QQ Plot. Make sure the normal distribution has been chosen in the resulting pop-up window (you may ignore the selections under ``Parameters:''.) To put the 45 degree reference line (the correct reference line when using Studentized residuals) on the normal quantile plot, choose Curves: QQ Ref Line, and then from the resulting pop-up window select Specification and specify 0 for the intercept and 1 for the slope. A more appropriate plot than a normal quantile plot of Studentized residuals is a plot versus t quantiles. The SAS macro TQPLOT will construct this plot for you. After asking for the name of the data set and response variable, the macro will ask if you want a regression fit, as opposed to a GLM (General Linear Model) fit. Answer ``y'' (without the quotes). You must then input the names of the regressor variables, separated by spaces. For the TREES data, you might specify the regressors CD CH CD*CH. In addition to producing the quantile plot, the macro computes and outputs the original data, the Studentized residuals, regular residuals, fitted values and t quantiles to a SAS data set of your choice. From there you can plot and analyze them further.
The SAS macro REGPRED computes level 0.95 confidence intervals for the mean of the response and level 0.95 prediction intervals for a new observation at each data value in the input data set and at additional user-specified predictor values. The predicted values are stored under the name PRED. The endpoints of the confidence intervals for the mean are stored under names L95MPRED and U95MPRED and those for prediction intervals for a future observation are stored under the names L95PRED and U95PRED in the SAS data set REGPRED. Standard SAS regression output is written to the SAS/OUTPUT window. As an example, suppose we want to use model (8.22) and the tree data to obtain intervals for the mean volume and to predict the volume of a new tree having diameter 10 inches and height 70 feet. When REGPRED asks 'ENTER THE NAME(S) OF THE PREDICTOR(S)', the response is D H, and when REGPRED asks 'ENTER THE NAME(S) OF THE REGRESSOR(S)', the response is D H D*H. When REGPRED asks 'WOULD YOU LIKE TO SPECIFY ADDITIONAL VALUES OF THE PREDICTORS AT WHICH TO COMPUTE PREDICTION INTERVALS?', answer y, and when prompted put in the values 10 70.
SAS/INSIGHT offers a particularly easy way to remove one variable at a time from a fitted regression model. As an example, suppose that you have fit the model for the tree data with regressors CD, CH and CD*CH, and that you want to remove CD*CH. To do so, return to the gray Fit(YX) window you used to fit the present model, click on CD*CH in the window containing the regressor names, and then on the ``Remove'' button in the lower right corner. CD*CH will be removed as a regressor. Now click on ``Run'' and the new model will be fit.
The instructions below are keyed to instructions in the text.
Data Generation To generate the data sets as in 1-3, invoke the SAS macro LAB8_1. You will be prompted for the name of the SAS data set to contain the data, the number of observations, the parameters of the model, and the desired correlation between the predictor variables. All quantities except the first and last remain the same for all three data sets.
Analysis
The instructions below are keyed to instructions in the text.
Data Generation To generate the data set, invoke the SAS macro LAB8_1. You will be prompted for the name of the SAS data set to contain the data, the number of observations, the parameters of the model, and the desired correlation between the regressors. This last is of interest for Lab8-1 only, so here just set the correlation to 0.5 and name the data set SET50.
Analysis
Look at the Data.
Create an Outlier and See What Happens. To change a data value in SAS/INSIGHT, click on the cell in the data window containing the value, type in the new value and hit the return key. The new value will now replace the old one. In addition, all plots and summary measures in SAS/INSIGHT that are associated with this value will automatically be updated for the new value. In particular, the regression fit, the plot of the Studentized residuals versus the fitted values and the associated measures, such as R^2, will all be updated. The t quantile plot will not be updated, however, so you will have to recreate this plot by first making a copy of the revised data set and then calling the macro TQPLOT.
Mean diamonds may be produced as follows:
SASDATA.PROSTATE contains a response variable (DELTAFLO) and a classification variable (TREATMNT). The classification variable is nominal, taking the values drug, microwav and surgery. SASDATA.WATCHES contains a response variable (TIME) and two classification variables (WORKER and METHOD). In SASDATA.WATCHES, the classification variables WORKER and METHOD are also nominal variables, even though they take on the values 1, 2, 3, 4, 5, and 1, 2, 3 respectively. In order to use SAS/INSIGHT to fit the models studied in this chapter, the classification variables must be nominal. If you are using a data set in which the classification variables are interval, you may change them to nominal by selecting : Define Variables..., and resetting the measurement level to nominal in the resulting dialog box. This may also be done by clicking on the word ``Int'' above the variable name in the data window. To fit the model, select Analyze: Fit ( Y X ) and choose the response as the Y variable and the classification variable(s) as the X variable(s). Output will include an ANOVA table. Residuals and fitted values will be computed and placed in the data window.
Studentized residuals can be computed from the fit window by choosing Vars: Studentized Residual. Residual plots may be obtained in the usual way in SAS/INSIGHT by plotting residuals or Studentized residuals against any variable of interest. You can also produce a normal quantile plot of the Studentized residuals. Do this by performing a distribution analysis of the Studentized residuals (choose Analyze: Distribution ( Y ) and from the Distribution Analysis window choose Graphs: QQ Plot). Make sure the normal distribution has been chosen in the resulting pop-up window (you may ignore the selections under ``Parameters:''.) To put the 45 degree reference line (the correct reference line when using Studentized residuals) on the normal quantile plot, choose Curves: QQ Ref Line, and then from the resulting pop-up window select Specification and specify 0 for the intercept and 1 for the slope. A more appropriate plot than a normal quantile plot of Studentized residuals is a plot versus t quantiles. The SAS macro TQPLOT will construct this plot for you. After asking for the name of the data set and response variable, the macro will ask if you want a regression fit, as opposed to a GLM (General Linear Model) fit. Answer ``n'' (without the quotes). You must the input the name of the classification variable (called class variable in the input window), and the name of the effect, which for the one-way model is the same as the class variable. For the prostate data both the class and effect entries will be TREATMNT. For the RCB model, there are two class variables, corresponding to the blocks and treatments. These are the also the effects. So, for the watches data, input the string WORKER METHOD as both class and effects variables. In addition to producing the quantile plot, TQPLOT computes and outputs the original data, the Studentized residuals, regular residuals, fitted values and t quantiles to a SAS data set of your choice. The macro RCBD will produce interaction plots and perform Tukey's test for the RCB model to check the assumption of additivity.
Individual and Bonferroni and Tukey multiple comparisons can be obtained from the SAS macros ONEWAY (for the non-blocked one-way model) and RCBD. The output will appear in the SAS OUTPUT window. The Tukey multiple comparison output will look like the output in Table 9.4 of the text. The output for Bonferroni multiple comparisons and for individual comparisons will resemble the output in Table 9.4, but will be labeled ``Bonferroni (Dunn) T tests ...'', and ``T tests (LSD) ...'', respectively, rather than ``T tests (TUKEY) ...''.
The following sections correspond to items 1-3 of the lab description in the text.
To fit the additive model (10.14), select Analyze: Fit ( Y X ) and choose the response as the Y variable and the variables giving factor levels as the X variables. As in the one-way case, the variables chosen as X variables must be nominal. Output will include an ANOVA table. Residuals and predicted values will be computed and placed in the data window. To fit the general model (10.16), proceed as with the additive model, but after selecting the X variables, use the mouse to highlight them in the X variable window and click on the ``Cross'' button just to the left. This creates an interaction term for the analysis.
Studentized residuals can be computed from the fit window by choosing Vars: Studentized Residual. Residual plots may be obtained in the usual way in SAS/INSIGHT by plotting residuals or Studentized residuals against any variable of interest. We recommend producing a normal quantile plot of the Studentized residuals by performing a distribution analysis of the Studentized residuals (choose Analyze: Distribution ( Y )) and from the distribution window choosing Graphs: QQ Plot. In the resulting dialog box, make sure Normal is selected as Distribution:. To add a reference line to the normal quantile plot, choose Curves:QQ Ref Line. From the dialog box, choose Specification, and then set the intercept to 0 and the slope to 1. A more appropriate plot than a normal quantile plot of Studentized residuals is a plot versus t quantiles. The SAS macro TQPLOT will construct this plot for you. After asking for the name of the data set and response variable, the macro will ask if you want a regression fit, as opposed to a GLM (General Linear Model) fit. Answer ``n'' (without the quotes). You must the input the name of the classification variables (called class variables in the input window), and the name of the effects. For the pulse oximetry data data, the class variables are INTENSIY and SHIVTYPE and effects will be INTENSIY SHIVTYPE INTENSIY*SHIVTYPE. In addition to producing the quantile plot, the macro computes and outputs the original data and the Studentized residuals to a SAS data set of your choice. The macro TWOWAY will produce interaction plots and compute individual, Bonferroni and Tukey pairwise comparisons of factor level means for both factors for the additive and general models.
The instructions below are keyed to instructions in the text.
data tutto; set socks bill hillary chelsea; run;
Both tests are easily conducted in SAS/INSIGHT. To do so, from the Data Window choose Analyze: Distribution( Y ), then from the resulting Distribution Analysis Window choose Tables: Location Tests.... From Base SAS, PROC UNIVARIATE will also give these tests.
PROC NPAR1WAY will compute the Wilcoxon rank sum test. The macro ONERAND will approximate the p-value using a randomization test, provided the ranks of the data are used instead of the raw data.
Suppose you have X-Y data under the variable names x and y in the SAS data set DATASET. You can use SAS/INSIGHT to create the ranks of X and Y and place them in the variables RX and RY. To do this:
From the SAS command line, PROC NPAR1WAY will give the large sample approximate Kruskal-Wallis test. The following commands will give the desired results for the prostate data found in Example 11.5:
proc npar1way data=prostate wilcoxon; class treatmnt; var deltaflo; run;
From the SAS command line, PROC FREQ will give the large sample approximate Friedman test. The following commands will give the desired results for the watch data found in Example 11.6:
proc rank data=watches out=rwatches; var time; by worker; ranks rtime; run; proc freq data=rwatches; tables worker*method*rtime/noprint cmh; run;The resulting output is:
SUMMARY STATISTICS FOR METHOD BY RTIME CONTROLLING FOR WORKER Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob -------------------------------------------------------------- 1 Nonzero Correlation 1 6.400 0.011 2 Row Mean Scores Differ 2 7.600 0.022 3 General Association 4 10.400 0.034 Total Sample Size = 15Friedman's test is given by statistic 2, has 2 degrees of freedom, value 7.6, and p-value 0.022.
The macro TWORAND will approximate the p-value closely using a randomization test. To get a good approximation of the p-value, choose a large number of randomizations when prompted: 100,000 should be do-able on most computers.
The SAS macro CAT2WAY will create two-way tables, and a number of statistics, including Fisher's exact test. Since it was designed with additional sophisticated analyses in mind, the input to and output from CAT2WAY contains some terms you will not be familiar with. To begin with, you must input the data. We will use the computer job data from Example 11.7 to illustrate. The easiest form for the data, which we will assume are contained in the SAS data set COMPJOB, is to have one variable for the row categories, another variable for the column categories, and a third variable for the counts in the cells. We will assume these variables are named GENDER, RACE and COUNT. The following will produce the two-way frequency table and Fisher's exact test (along with a number of other tests) for the computer job data:
TABLE OF GENDER BY RACE GENDER RACE Frequency | Expected | Cell Chi-Square| Percent | Row Pct | Col Pct |black |white | Total ---------------+--------+--------+ female | 4 | 2 | 6 | 2.8 | 3.2 | | 0.5143 | 0.45 | | 26.67 | 13.33 | 40.00 | 66.67 | 33.33 | | 57.14 | 25.00 | ---------------+--------+--------+ male | 3 | 6 | 9 | 4.2 | 4.8 | | 0.3429 | 0.3 | | 20.00 | 40.00 | 60.00 | 33.33 | 66.67 | | 42.86 | 75.00 | ---------------+--------+--------+ Total 7 8 15 46.67 53.33 100.00 STATISTICS FOR TABLE OF GENDER BY RACE Statistic DF Value Prob ------------------------------------------------------ Chi-Square 1 1.607 0.205 Likelihood Ratio Chi-Square 1 1.632 0.201 Continuity Adj. Chi-Square 1 0.547 0.460 Mantel-Haenszel Chi-Square 1 1.500 0.221 Fisher's Exact Test (Left) 0.965 (Right) 0.231 (2-Tail) 0.315 Phi Coefficient 0.327 Contingency Coefficient 0.311 Cramer's V 0.327 Sample Size = 15 WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test.As you can see, the one-tailed Fisher's test p-value is 0.231, just as computed in the text.
The following commands will enter the data for Example 11.7:
data compjob; input gender $ race $ count @@; cards; male white 6 male black 3 female white 2 female black 4 ; run;Proc FREQ can compute the p-value for Fisher's exact test, Pearson's chi-square test statistic and its p-value, and other associated quantities. The following commands will produce the table and tests for Example 11.7:
proc freq data=skiers order=data; weight count; tables treat*cond / chisq cellchi2 exact; run;
The macro ONERAND will approximate the p-value closely using a randomization test. To get a good approximation of the p-value, choose a large number of randomizations when prompted: 100,000 should be do-able on most computers.
The macro GKWRAND will conduct a randomization test version of the generalized Kruskal-Wallis test. To get a good approximation of the p-value, choose a large number of randomizations when prompted: 100,000 should be do-able on most computers. By using the ranks of the data as the response variable, you will obtain a randomization test version of the Kruskal-Wallis test.
The macro GFRAND will conduct a randomization test version of the generalized Friedman test. To get a good approximation of the p-value, choose a large number of randomizations when prompted: 100,000 should be do-able on most computers. By using the ranks of the data as the response variable, you will obtain a randomization test version of Friedman's test.
Before any bootstrap inference procedure for measurement data, you should investigate the data for outliers. SAS/INSIGHT is the easiest way to do this.
The macro CEBOOT will compute one-sample (equation (11.16)) and two-sample bootstrap (equation (11.20)) confidence intervals for the C+E model, based on the sample mean as estimator. This macro will prompt you for the needed input information. Graphical output consists of a plot of the normal theory t sampling distribution superimposed on the bootstrapped sampling distribution for the mean or difference of means, whichever is appropriate. The bootstrapped parameter values are output to a SAS file of your choice. Normal theory and bootstrap level L confidence intervals for the mean or difference of means (whichever is appropriate) are generated for user-selected L. CEBOOT will also compute the bootstrap prediction interval given by equation (11.17).
The macro BIBOOTP will generate two-sample bootstrap confidence intervals for population proportions (equation (11.22)). This macro will prompt you for the needed input information. Graphical output consists of a plot of the normal theory N(0,1) sampling distribution superimposed on the bootstrapped sampling distribution for the difference in proportions. The bootstrapped parameter values are output to a SAS file of your choice. Normal theory and bootstrap level L confidence intervals for the difference in proportions are generated for user-selected L. BIBOOTP will also calculate bootstrap confidence intervals for the proportion p from a single b(n,p) population, though with the availability of exact intervals (from the SAS macro BIEXACT, for example), there is little need for a bootstrap interval.
The macro NPTOL will compute the sample size necessary for the distribution-free tolerance interval discussed in Section 11.13.
The SAS macro EFFECTS computes the effect estimates for an unreplicated 2^k design, and produces a plot showing the effects and the values of MOE and SMOE. Two SAS files are created. The first, whose name you specify at the prompt ``DATA FILE TO STORE OUTPUT'', contains the response, factors and interaction terms. The latter are labeled I12, I13, I123, etc. The second file, called DRANK contains the quantities effect name (EFFECT), effect estimate (ESTIMATE), normal quantile (QUANTILE), and effect label (LABEL).
To obtain a normal quantile plot of the effects, you should open DRANK with SAS/INSIGHT and plot QUANTILE versus ESTIMATE, including LABEL as a label variable. To do this, choose Analyze:Scatter Plot ( Y X ) from the menu bar on the data window. A dialog window will appear. In this window, select QUANTILE as the Y variable, ESTIMATE as the X variable, and LABEL as the label variable. Click on ``OK'' to do the plot. When the plot appears and you resize it, you can click on any of the estimated effects appearing on it to see the name of the effect being estimated.
To obtain the residuals and fitted values, take the following steps:
The macro CEFFECTS is the analogue of the macro EFFECTS for 2^k experiments with replicated center points. CEFFECTS works very much like EFFECTS: it computes all interaction variables and outputs them along with the responses and factors to a SAS file of your choice, and it computes the quantities effect name (EFFECT), effect estimate (ESTIMATE), normal quantile (QUANTILE), and effect label (LABEL) and puts them in the SAS data file DRANK. It also computes a test for curvature, which EFFECTS does not.
This is obtained as for the unreplicated design.
This is done essentially as for the unreplicated design, except that you must exclude the center points from the fit. To do this, select the center points in the data window, and then choose Edit: Windows: Exclude in Calculations. After this, proceed as for the unreplicated design.
The interaction plot shown in Figure 12.3 was produced by the SAS macro IPLOT. The data are found in the SAS data set SF. To generate Figure 12.3, you should answer the prompts for input as follows:
The transformations discussed in Section 12.12 are easily available in SAS/INSIGHT from the data window by choosing Edit:Variables from the menu bar.
Some nice features have been implemented into the macros EFFECTS, CEFFECTS and IPLOT, but these require some restrictions on what can be done automatically in them. Three that you should be aware of are:
Suppose we want to obtain a 2^(5-2)_V design (if possible). Call up the macro DESIGN2. A window will appear which will prompt you for the number of factors (tell it 5), the desired names of the factors (tell it A, B, C, D and E) the size of the fraction (tell it 4), the number of blocks (tell it 1), the maximum size interaction to display in the alias structure (tell it 5), and the name of a SAS data set to contain the design points. SAS will give you a design of maximum possible resolution. Now look at the SAS OUTPUT window. An orthogonal array will be displayed, consisting of the main effects (labeled A-E), and a column of ones for blocks. Ignore the latter for now. This array can be used to run the experiment, as the order of its runs has been randomized. Now scroll upward in the window. The aliasing structure will be displayed. (note that SAS uses ``0'' instead of ``I'' to denote the identity). The orthogonal array has also been output to the SAS data set you specified. When you run the experiment, you can use SAS/INSIGHT to enter the responses in this data set, and save the results for further analyses.
To incorporate blocks into the 2^(k-p) design, run the macro DESIGN2 as above and simply input the number of blocks you want at the appropriate prompt. Try this now for a 2^(5-2)_III design with two blocks. The variable ``BLOCK'' in the orthogonal array in the output tells to which block each treatment combination is assigned. The aliasing structure in the output shows which effects the blocks (denoted ``[B]'') are confounded with. Here they are AC, BD, ABE, and CDE. In terms of the orthogonal array, those terms with a ``+'' in the product of the A and C columns are assigned to one block, the terms with a ``-'' are assigned to the other block. This is the design for the EVA ring data shown in Table~13.9 of the text, if we take A to be Mold Temperature, B to be Screw Speed, C to be Hold Pressure, D to be Probe Temperature and E to be Hold Time.
You may use the macros EFFECTS and CEFFECTS to obtain estimates in 2^(k-p) designs. However, you must input only k-p of the k main effects. You can then determine the estimate of confounded effects by using the aliasing structure of the design. For example, suppose you want to run a 2^(6-2) design with factors A, B, C, D, E and F. You use the macro DESIGN2 to generate the design shown in the following table:
A B C D E F ________________________________ -1 1 1 1 1 -1 -1 -1 1 1 -1 -1 1 -1 1 -1 1 -1 1 1 1 1 1 1 1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 1 1 -1 1 1 -1 -1 1 1 -1 1 1 -1 -1 1 1 -1 1 -1 -1 1 -1 1 1 -1 1 -1 -1 -1 -1 1 -1 1 1 -1 1 1 -1 -1 1 -1 1 -1 1 -1 1 ________________________________You then run EFFECTS, inputting the number of factors as 4 and naming these as A, B, C and D. Here is how the output from EFFECTS giving the computed effects would appear.
OBS EFFECT LABEL ESTIMATE MOE SMOE 1 A a 2.50 0.077864 0.15807 2 B b -0.50 0.077864 0.15807 3 C c -2.75 0.077864 0.15807 4 D d -0.75 0.077864 0.15807 5 I12 a*b 1.00 0.077864 0.15807 6 I123 a*b*c 0.25 0.077864 0.15807 7 I1234 a*b*c*d 0.50 0.077864 0.15807 8 I124 a*b*d -0.75 0.077864 0.15807 9 I13 a*c -0.25 0.077864 0.15807 10 I134 a*c*d -1.00 0.077864 0.15807 11 I14 a*d 0.75 0.077864 0.15807 12 I23 b*c 0.75 0.077864 0.15807 13 I234 b*c*d 1.00 0.077864 0.15807 14 I24 b*d -1.25 0.077864 0.15807 15 I34 c*d 0.50 0.077864 0.15807As can be seen, they are named as main effects or interactions of A, B, C and D. In order to determine effects involving E and F you will have to consult the aliasing structure, which is displayed in the following Table:
Aliasing Structure 0 = A*B*E*F = A*C*D*F = B*C*D*E A = B*E*F = C*D*F = A*B*C*D*E B = A*E*F = C*D*E = A*B*C*D*F C = A*D*F = B*D*E = A*B*C*E*F D = A*C*F = B*C*E = A*B*D*E*F E = A*B*F = B*C*D = A*C*D*E*F F = A*B*E = A*C*D = B*C*D*E*F A*B = E*F = A*C*D*E = B*C*D*F A*C = D*F = A*B*D*E = B*C*E*F A*D = C*F = A*B*C*E = B*D*E*F A*E = B*F = A*B*C*D = C*D*E*F A*F = B*E = C*D = A*B*C*D*E*F B*C = D*E = A*B*D*F = A*C*E*F B*D = C*E = A*B*C*F = A*D*E*F A*B*C = A*D*E = B*D*F = C*E*F A*B*D = A*C*E = B*C*F = D*E*FFrom the aliasing structure, we can see, for example, that the effect for E is the same as the BCD interaction which will appear on the EFFECTS output. Similarly, the effect for F will be found as the ACD interaction, and so on for any other effect of interest. Note: It is possible to choose a set of k-p main effects which have some interactions that are aliased with main effects resulting in EFFECTS or CEFFECTS producing estimates of 0. If this happens, choose another k-p main effects. Experience shows that sticking to the first k-p main effects as inputs to EFFECTS or CEFFECTS avoids this problem.
The macro CCDGEN will give you a range of Central Composite Designs to choose from for any desired number of factors. Input consists of the number of factors. Output, which is written to the SAS Output Window, consists of the types of designs available and instructions on how to generate them. As an example, the following table displays output from CCDGEN when the number of factors is input as 3:
Number of Runs in the Factorial Number of Axial Total Number Portion Center Points Extreme of runs ----------- ---------------- ------- -------------------- 1. 8 9 1.6818 23 2. 8 6 = ( 2*2) + 2 1.6330 20 = ( 2* 6) +8 %adxccd() parameters to construct: ----------------------------------------- 1. %adxccd(*data set name*,3,8,9,1.6818) 2. %adxccd(*data set name*,3,8,2/2,1.6330,3) For blocked designs, equations give Number of Number in each Number in Total = ( factorial * factorial ) + axial blocks block blockThis output shows two basic CCDs. The first is the standard design with 8 corner points, 9 center points and 6 star (here called axial) points. The ``axial extreme'' is the coded value of a (see Section 14.6) at which the star point is located. Note that in that section it was stated that a equal to the square root of 3 (=1.732) would give a rotatable design. Here, the design is optimized using other considerations than just rotatability, but the result is still nearly rotatable. The second design involves blocking and will not be considered further here. The commands below the heading ``\%adxccd() parameters to construct:'' tell how to generate the design and have it output to the Output Window and stored in a SAS data set. So that, if you want to store the output in the SAS data set ``dataset'' (remembering that this name should begin with ``sasuser.'' to be permanent), submit the command
%adxccd(dataset,3,8,9,1.6818);from the SAS Editor Window.
Once you have the data for a CCD in a SAS data file, you may fit a response surface model using the macro RSCOMP. Input to RSCOMP is self-explanatory. Output is written to the input window and consists of the fitted model, significant effects (at the .05 level), stationary point (in coded units), eigenvalues, eigenvectors, and the estimated response at the stationary point.
The macro QUADGEN will prompt you to input values for x1 and x2, and will output the value of the response, y. Use it to attempt OFAT optimization. Later, you can use the macro SURFPLOT will produce a contour plot and a 3-D plot of the response surface. Use these plots to see how well the OFAT optimization did.
The macro NACF will compute the mean of each subgroup and display a normal quantile plot and autocorrelation plot for these means.
To create X-bar and S charts using SAS, follow these steps (we use the dressing stone data as an example):
To compare the quantities (USL-[X-bar-bar])/(s-bar) and (LSL-[X-bar-bar])/(s-bar) with the N(0,1) density, you must compute the area under the N(0,1) density above the former and below the latter. To do this, use the macro NPROBS, which will give the area under the N(0,1) density below any input value. To obtain the estimated capability indices Cp-hat, Cpk-hat and Cpm-hat using SAS, proceed as follows (we will use the ALUM data set as an example):