No Title

$next$ $up$ $previous$
Next: About this document ...

Multivariable Visualization
Tools:
- Scatterplot Array
- Rotating 3-D Plots
Let's try these out. Each of the data sets sasdata.eg10_2a, sasdata.eg10_2b, sasdata.eg10_2c and sasdata.eg10_2d contains data generated by one of four models shown on the next page. Using only the display of the data set itself and a scatterplot array, you are to tell which data set was generated by which model.
The models are:
$\begin{displaymath} \mbox{1. }Y = -1+7x_1+6x_2-3x_1^2+2x_2^2+7x_1x_2+\epsilon,\end{displaymath}$

$\begin{displaymath} \mbox{2. }Y = \; \; \; 5+7x_1+6x_2-3x_1^2+2x_2^2+\epsilon,\end{displaymath}$

$\begin{displaymath} \mbox{3. }Y = \; \; \; 5+7x_1+6x_2-3x_1^2+2x_2^2+7x_1x_2+\epsilon,\end{displaymath}$

$\begin{displaymath} \mbox{4. }Y = -1+7x_1+6x_2-3x_1^2+2x_2^2+\epsilon,\end{displaymath}$
where . Be sure to write down your answers.
Now use the rotating 3-D plot to view the data. Does this change your guesses?
The MLR Model
$\begin{displaymath} Y = \beta_0+\beta_1 X_1(Z_1,Z_2, \ldots, Z_p)+\beta_2 X_2(Z_... ...ldots, Z_p)+\ldots+ \beta_q X_q(Z_1,Z_2, \ldots, Z_p)+\epsilon,\end{displaymath}$
where the Zs are the predictor variables and $\epsilon$ is a random error. Examples are
$\begin{displaymath} Y = \beta_0+\beta_1Z_1+\beta_2Z_1^2+\epsilon,\end{displaymath}$

$\begin{displaymath} Y = \beta_0+\beta_1 Z_1+\beta_2 Z_2+\beta_3Z_1^2+\beta_4Z_1Z_2+\beta_5Z_2^2+ \epsilon,\end{displaymath}$

$\begin{displaymath} Y = \beta_0+\beta_1\log(Z_2)+\beta_3\sqrt{Z_1Z_2}+\epsilon.\end{displaymath}$
We will write these models generically as
$\begin{displaymath} Y=\beta_0+\beta_1 X_1+\beta_2 X_2+\ldots + \beta_q X_q+\epsilon.\end{displaymath}$
Fitting the MLR Model
As we did for SLR model, we use least squares to fit the MLR model. This means finding estmators of the model parameters $\beta_0, \beta_1, \ldots, \beta_q$ and $\sigma^2$ . The LSEs of the $\beta$ s are those values, of $b_0,b_1,\ldots,b_q$ , denoted $\hat{\beta}_0, \hat{\beta}_1, \ldots, \hat{\beta}_q$ ,which minimize
$\begin{displaymath} \mbox{SSE}(b_0,b_1,\ldots,b_q)=\end{displaymath}$

$\begin{displaymath} \sum_{i=1}^n[Y_i-(b_0+b_1 X_{i1}+b_2 X_{i2}+\cdots +b_q X_{iq})]^2.\end{displaymath}$
The fitted values are
$\begin{displaymath} \hat{Y}_i=\hat{\beta}_0+\hat{\beta}_1 X_{i1}+ \hat{\beta}_2 X_{i2}+ \cdots +\hat{\beta}_q X_{iq},\end{displaymath}$
and the residuals are
$\begin{displaymath} e_i= Y_i-\hat{Y}_i.\end{displaymath}$

Let's see what happens when we fit models to sasdata.eg10_2a and sasdata.eg10_2c.
Assessing Model Fit
Residuals and studentized residuals are the primary tools to analyze model fit. We look for outliers and other deviations from model assumptions. Let's look at the residuals from some fits to sasdata.eg10_2c.
Interpretation of the Fitted Model
The intercept has the interpretation ``expected response when the X_i all equal 0''. The coefficient $\hat{\beta}_i$ is interpreted as the change in expected response per unit change in X_i when the other Xs are held fixed (if that is possible).
Otherwise can interpret the model using multivariate calculus: change in expected response per unit change in Z_i (with the other predictors held fixed) is
$\begin{displaymath} \frac{\partial}{\partial Z_i}(\hat{\beta}_0+\hat{\beta}_1X_1+\ldots+\hat{\beta}_qX_q).\end{displaymath}$
So, for example, if the fitted model is
$\begin{displaymath} \hat{\beta}_0+\hat{\beta}_1Z_1+\hat{\beta}_2Z_2+\hat{\beta}_3Z_1Z_2,\end{displaymath}$

$\begin{displaymath} \frac{\partial}{\partial Z_1}(\hat{\beta}_0+\hat{\beta}_1Z_1... ...ta}_2Z_2+\hat{\beta}_3Z_1Z_2) = \hat{\beta}_1+\hat{\beta}_3Z_2.\end{displaymath}$
Theory-Based Modeling
Two ways of building models:
- Empirical modeling
- Theoretical modeling
Comparison of Fitted Models
- Residual analysis
- Principle of parsimony (simplicity of description)
- Coefficient of multiple determination, and its adjusted cousin.
ANOVA
Idea:
- Total variation in the response (about its mean) is measured by
  $\begin{displaymath} \mbox{SSTO}=\sum_{i=1}^n(Y_i-\overline{Y})^2.\end{displaymath}$
  This is the variation or uncertainty of prediciton if no predictor variables are used.
- SSTO can be broken down into two pieces: SSR, the regression sum of squares, and SSE, the error sum of squares, so that SSTO=SSR+SSE.
- $\mbox{SSE}=\sum_i^ne_i^2$ is the total sum of the squared
  residuals. It measures the variation of the response unaccounted for by the fitted model or the uncertainty of predicting the response using the fitted model.
- $\mbox{SSR}=SSTO-SSR$ is the variability explained by the fitted model or the reduction in uncertainty of prediction due to using the fitted model.
Degrees of Freedom
The degrees of freedom for a SS is the number of independent pieces of data making up the SS. For SSTO, SSE and SSR the degrees of freedom are n-1, n-q-1 and q. These add just as the SSs do. A SS divided by its degrees of freedom is called a Mean Square.
The ANOVA Table
This is a table which summarizes the SSs, degrees of freedom and mean squares.

Analysis of Variance

Source DF SS MS F Stat Prob > F

Model q SSR MSR F=MSR/MSE p-value

Error n-q-1 SSE MSE

C Total n-1 SSTO
Inference for the MLR Model: The F Test
- The Hypotheses:
  
  H₀: $\beta_1=\beta_2= \cdots =\beta_q=0$
  
  H_a: Not H₀
- The Test Statistic: F=MSR/MSE
- The P-Value: P(F_q,n-q-1>F^*), where F_q,n-q-1 is a random variable from an F_q,n-q-1 distribution and F^* is the observed value of the test statistic.
T Tests for Individual Predictors
- The Hypotheses:
  
  H₀: $\beta_i=0$
  
  H_a: $\beta_i\neq 0$
- The Test Statistic: $t=\frac{\hat{\beta}_i} {\hat{\sigma}(\hat{\beta}_i)}$
- The P-Value: P(|t_n-q-1|>|t^*|), where t_n-q-1 is a random variable from a t_n-q-1 distribution and t^* is the observed value of the test statistic.
Summary of Intervals for MLR Model
- Confidence Interval for Model Coefficients: A level L confidence interval for $\beta_i$ is
  $\begin{displaymath} (\hat{\beta}_i-\hat{\sigma}(\hat{\beta}_i)t_{n-q-1,(1+L)/2}, \hat{\beta}_i+\hat{\sigma}(\hat{\beta}_i) t_{n-q-1,(1+L)/2}).\end{displaymath}$
- Confidence Interval for Mean Response: A level L confidence interval for the mean response at at predictor values $X_{10},X_{20},\ldots, X_{q0}$ is
  $\begin{displaymath} (\hat{Y}_0-\hat{\sigma}(\hat{Y}_0)t_{n-q-1,(1+L)/2} ,\hat{Y}_0+ \hat{\sigma}(\hat{Y}_0) t_{n-q-1,(1+L)/2}),\end{displaymath}$
  where
  $\begin{displaymath} \hat{Y}_0=\hat{\beta}_0+\hat{\beta}_1X_{10}+\cdots+\hat{\beta}_qX_{q0},\end{displaymath}$
  and $\hat{\sigma}(\hat{Y}_0)$ is the estimated standard error of the response.
- Prediction Interval for a Future Observation:
  A level L prediction interval for a new response at predictor values $X_{10},X_{20},\ldots, X_{q0}$ is
  $\begin{displaymath} (\hat{Y}_{new}-\hat{\sigma}(Y_{new}-\hat{Y}_{new})t_{n-q-1,(1+L)/2} ,\end{displaymath}$
  
  $\begin{displaymath} \hat{Y}_{new}+\hat{\sigma}(Y_{new}-\hat{Y}_{new}) t_{n-q-1,(1+L)/2}),\end{displaymath}$
  where
  $\begin{displaymath} \hat{Y}_{new}=\hat{\beta}_0+\hat{\beta}_1X_{10}+ \cdots+\hat{\beta}_qX_{q0},\end{displaymath}$
  and
  $\begin{displaymath} \hat{\sigma}(Y_{new}-\hat{Y}_{new})=\sqrt{\mbox{MSE}+ \hat{\sigma}^2(\hat{Y}_0)}.\end{displaymath}$
Multicollinearity
Multicollinearity is correlation among the predictors.
- Consequences
  
  o
  Large sampling variability for $\hat{\beta}_i$
  o
  Questionable interpretation of $\hat{\beta}_i$ as change in expected response per unit change in X_i.
- Detection R_i², the coefficient of multiple determination obtained from regressing X_i on the other Xs, is a measure of how highly correlated X_i is with the other Xs. This leads to two related measures of multicollinearity.
  
  o
  Tolerance TOL_i=1-R_i² Small TOL_i indicates X_i is highly correlated with other Xs. We should begin getting concerned if TOL_i<0.1.
  o
  VIF VIF stands for variance inflation factor. VIF_i=1/TOL_i. Large VIF_i indicates X_i is highly correlated with other Xs. We should begin getting concerned if VIF_i>10.
- Remedial Measures
  
  o
  Center the X_i (or sometimes the Z_i)
  o
  Drop offending X_i
Empirical Model Building
Selection of variables in empirical model building is an important task. We consider only one of many possible methods: backward elimination, which consists of starting with all possible X_i in the model and eliminating the non-significant ones one at at time, until we are satisfied with the remaining model.