where the Zs are the predictor variables and is a random error. Examples are
We will write these models generically as
The surface defined by the deterministic part of the multiple linear regression model,
is called the response surface of the model.
When considered a function of the regressors, the response surface is defined by the functional relationship
If it is possible for the Xi to simultaneously take the value 0, then is the value of the response surface when all Xi equal 0. Otherwise, has no separate interpretation of its own.
As a function of the predictors, the response surface is defined by the functional relationship
the change in expected response per unit change in zi is
the change in expected response per unit change in z1 is
and the change in expected response per unit change in z2 is
The modeling process involves the following steps:
Multivariable visualization begins with a number of standard statistical tools, such as histograms, to look at each variable individually, or scatterplots, to look at pairs of variables. But the true power of multivariable visualization can be found only in a set of sophisticated statistical tools which make use of multiple dynamically-linked displays (You won't find these in Microsoft Excel!) Two such tools are
Now use the rotating 3-D plot to view the data. Does this change your guesses?
As we did for SLR model, we use least squares to fit the MLR model. This means finding estmators of the model parameters and . The LSEs of the s are those values, of , denoted , which minimize
The fitted values are
and the residuals are
Let's see what happens when we fit identify and fit a model to data in sasdata.cars93a.
Residuals and studentized residuals are the primary tools to analyze model fit. We look for outliers and other deviations from model assumptions.
Let's look at the residuals from the fit to the data in sasdata.cars93a.
The fitted model is
If we feel that this model fits the data well, then for purposes of interpretation, we regard the fitted model as the actual response surface, and we interpret it exactly as we would interpret the response surface.
Let's interpret the fitted model for the fit to the data in sasdata.cars93a.
Two ways of building models:
Let's fit a second model to the data in sasdata.cars93a, and compare its fit to the first model we considered.
This is the variation or uncertainty of prediciton if no predictor variables are used.
The degrees of freedom for a SS is the number of independent pieces of data making up the SS. For SSTO, SSE and SSR the degrees of freedom are n-1, n-q-1 and q. These add just as the SSs do. A SS divided by its degrees of freedom is called a Mean Square.
This is a table which summarizes the SSs, degrees of freedom and mean squares.
Here's the ANOVA table for the original fit to the sasdata.cars93a data.
Here are the tests for the original fit to the sasdata.cars93a data.
and is the estimated standard error of the response.
A level L prediction interval for a new response at predictor values has endpoints
Here are some intervals for the original fit to the sasdata.cars93a data.
Multicollinearity is correlation among the predictors.
Here's an example of a model for the sasdata.cars93a data which has lots of multicollinearity:
Selection of variables in empirical model building is an important task. We consider only one of many possible methods: backward elimination, which consists of starting with all possible Xi in the model and eliminating the non-significant ones one at at time, until we are satisfied with the remaining model.
Here's an example of empirical model building for the sasdata.cars93a data.