Regression analysis
is used to model the relationship between a response
variable and one or more predictor variables. STATGRAPHICS Centurion
provides a large number of procedures for fitting different
types of regression models:
1.
Simple Regression - fits
linear and nonlinear models with one predictor.
Includes both least squares and resistant methods.
2.
Box-Cox Transformations -
fits a linear model with one predictor, where the Y
variable is transformed to achieve approximate
normality.
3.
Polynomial Regression
- fits a polynomial model with one predictor.
4.
Calibration Models -
fits a linear model with one predictor and then
solves for X given Y.
4.
Multiple Regression -
fits linear models with two or more predictors.
Includes an option for forward or backward stepwise
regression and a Box-Cox or Cochrane-Orcutt
transformation.
5.
Comparison of Regression Lines
- fits regression lines for one predictor at each
level of a second predictor. Tests for significant
differences between the intercepts and slopes.
6.
Regression Model Selection
- fits all possible regression models for multiple
predictor variables and ranks the models by the
adjusted R-squared or Mallows' Cp statistic.
7.
Ridge Regression - fits a
multiple regression model using a method designed to
handle correlated predictor variables.
8.
Nonlinear Regression -
fits a user-specified model involving one or more
predictors.
9.
Partial Least Squares - fits
a multiple regression model using a method that
allows more predictors than observations.
10.
General Linear Models - fits
linear models involving both quantitative and
categorical predictors.
11.
Life Data Regression -
fits regression models for response variables that
represent failure times. Allows for censoring and
non-normal error distributions.
12.
Regression Analysis for
Proportions - fits logistic and probit
models for binary response data.
13.
Regression Analysis for Counts
- fits Poisson and negative binomial regression
models.
Simple Regression
The simplest
regression models involve a single response variable Y
and a single predictor variable X. STATGRAPHICS will fit
a variety of functional forms, listing the models in
decreasing order of R-squared. If outliers are
suspected, resistant methods can be used to fit the
models instead of least squares.
Comparison of
Alternative Models
Model |
R-Squared |
Squared-Y
reciprocal-X |
87.75% |
Reciprocal-X |
87.11% |
Square root-Y
reciprocal-X |
86.71% |
S-curve model |
86.27% |
Double reciprocal |
85.25% |
Reciprocal-Y
logarithmic-X |
84.99% |
Multiplicative |
84.98% |
Logarithmic-X |
84.77% |
Squared-Y
logarithmic-X |
84.36% |
Reciprocal-Y
square root-X |
81.69% |
Logarithmic-Y
square root-X |
81.21% |
Square root-X |
80.54% |
Squared-Y square
root-X |
79.68% |
Reciprocal-Y |
76.73% |
Exponential |
75.87% |
Square root-Y |
75.37% |
Logistic |
75.08% |
Log probit |
75.03% |
Linear |
74.83% |
Squared-Y |
73.63% |
Reciprocal-Y
squared-X |
64.37% |
Logarithmic-Y
squared-X |
63.05% |
Square root-Y
squared-X |
62.34% |
Squared-X |
61.60% |
Double squared |
60.04% |
Box-Cox Transformations
When the response
variable does not follow a normal distribution, it is
sometimes possible to use the methods of Box and Cox to
find a transformation that improves the fit. Their
transformations are based on powers of Y. STATGRAPHICS
will automatically determine the optimal power and fit
an appropriate model.
Polynomial Regression
Another approach to
fitting a nonlinear equation is to consider polynomial
functions of X. For interpolative purposes, polynomials
have the attractive property of being able to
approximate many kinds of functions.
Calibration Models
In a typical
calibration problem, a number of known samples are
measured and an equation is fit relating the
measurements to the reference values. The fitted
equation is then used to predict the value of an unknown
sample by generating an inverse prediction (predicting X
from Y) after measuring the sample.
Multiple Regression
The Multiple
Regression procedure fits a model relating a
response variable Y to multiple predictor variables X1,
X2, ... . The user may include all predictor variables
in the fit or ask the program to use a stepwise
regression to select a subset containing only
significant predictors. At the same time, the Box-Cox
method can be used to deal with non-normality and the
Cochrane-Orcutt procedure to deal with autocorrelated
residuals.
Comparison of Regression Lines
In some situations,
it is necessary to compare several regression lines.
STATGRAPHICS will fit parallel or non-parallel linear
regressions for each level of a "BY" variable and
perform statistical tests to determine whether the
intercepts and/or slopes of the lines are significantly
different.
Regression Model Selection
If the number of
predictors is not excessive, it is possible to fit
regression models involving all combinations of 1
predictor, 2 predictors, 3 predictors, etc, and sort the
models according to a goodness-of fit statistic. In
STATGRAPHICS, the Regression Model Selection
procedure implements such a scheme, selecting the models
which give the best values of the adjusted R-Squared or
of Mallows' Cp statistic.
Ridge Regression
When the predictor
variables are highly correlated amongst themselves, the
coefficients of the resulting least squares fit may be
very imprecise. By allowing a small amount of bias in
the estimates, more reasonable coefficients may often be
obtained. Ridge regression is one method to address
these issues. Often, small amounts of bias lead to
dramatic reductions in the variance of the estimated
model coefficients.
Nonlinear Regression
Most least squares
regression programs are designed to fit models that are
linear in the coefficients. When the analyst wishes to
fit an intrinsically nonlinear model, a numerical
procedure must be used. The STATGRAPHICS Nonlinear Least
Squares procedure used an algorithm due to Marquardt to
fit any function entered by the user.
Partial Least Squares
Partial Least
Squares is designed to construct a statistical model
relating multiple independent variables X to multiple
dependent variables Y. The procedure is most helpful
when there are many predictors and the primary goal of
the analysis is prediction of the response variables.
Unlike other regression procedures, estimates can be
derived even in the case where the number of predictor
variables outnumbers the observations. PLS is widely
used by chemical engineers and chemometricians for
spectrometric calibration.
General Linear Models
The GLM procedure is
useful when the predictors include both quantitative and
categorical factors. When fitting a regression model, it
provides the ability to create surface and contour plots
easily.
Life Data Regression
To describe the impact of
external variables on failure times, regression models may
be fit. Unfortunately, standard least squares techniques do
not work well for two reasons: the data are often censored,
and the failure time distribution is rarely Gaussian. For
this reason, STATGRAPHICS provides a special procedure that
will fit life data regression models with censoring,
assuming either an exponential, extreme value, logistic,
loglogistic, lognormal, normal or Weibull distribution.
Regression Analysis for
Proportions
When the response
variable is a proportion or a binary value (0 or 1),
standard regression techniques must be modified.
STATGRAPHICS provides two important procedures for this
situation: Logistic Regression and Probit
Analysis. Both methods yield a prediction equation
that is constrained to lie between 0 and 1.
Regression Analysis for Counts
For response
variables that are counts, STATGRAPHICS provides two
procedures: a Poisson Regression and a
Negative Binomial Regression. Each fits a loglinear
model involving both quantitative and categorical
predictors.
|