UP

Chapter 13:

 Multiple Regression Correlation

I.    Contrasting Bivariate Correlation and Multiple Regression Correlation

 

·  Though a simple bivariate correlation deals with the association between two variables, multiple regression correlation includes multiple predictor variables.

·  In addition to the assumptions of simple bivariate correlation, multiple regression correlation also assumes:

1.   Dependent variable scores are normally distributed all along the line of best fit (also known as a line of regression). Thus, the assumption is made that residuals are normally distributed.

2.   There is a linear relationship between observed and predicted values of the dependent variable. This assumption also means that the residuals have a mean of zero.

3.   The variability of the residuals is the same through the entire range of values for the independent variables.

·  Advantages of multiple regression correlation over simple bivariate correlation:

1.      Curvilinear effects can be tested by adding terms that are raised to different powers of the original variables.

2.      Interaction effects can be tested.

3.      Researchers may learn how much variation in the dependent variable is explained by one set of variables as opposed to another

·  An advantage that multiple regression correlation shares with simple linear correlation is that the relative importance of each variable can be identified.  

II.  Components of Multiple Correlations

Multiple correlation (sometimes called multiple regression correlation or multiple linear correlation): an extension of linear correlation that permits researchers to show a correlation between the optimal combination of more than one independent (or predictor) variables with a single dependent (or criterion) variable.

Residuals: the differences between observed and predicted values of the dependent variable (given the multiple regression correlation model).

bivariate regression model: is an equation that identifies the relationship between an independent and a dependent variable.

multiple regression model: illustrates the relationship between the dependent variable and a linear combination of independent variables and an error term.

multiple regression equation: a “mathematical equation relating the expected value or mean value of the dependent variable to the values of the independent variables” (D. R. Anderson, Sweeney, & Williams, 2003, p. 683).

 

  1. Predictor Variables

--Predictor variables may produce different sorts of effects, similar to those found in analysis of variance.

predictor variables: the independent variables used by the researcher.

1.      Additive effects

--With additive effects, “the effects of the independent variables on a dependent variable can simply be added together to find their total effect”(Vogt, 2005, p. 4).

--Additive effects are identified in the multiple correlation equation:

E(Y) = β0 + β1X1 + β2X2 by such terms as β1X1 and β2X2.

Additive effects:  the influences of predictor variables separately.

 

2.  Interaction effects

--Because multiple linear regression does not deal with interaction terms directly, researchers may create additional variables that are interactions among predictor variables. Then, the created variables may be entered into the multiple correlation to test if the observed R shows an increase.

--Sometimes these interactions are considered moderator effects because “the interacting third variable which changes the relation between two original variables is a moderator variable which moderates the original relationship” (Garson, 2003, ¶ 10).

Interaction effects are the influences of predictor variables taken in combinations.

 

2.      The need to avoid multicollinearity

--The failure to maintain uncorrelated predictor variables is a problem because:

·  Interpretation of the actual impact of predictor variables may become difficult

·  Sampling stability is jeopardized.

·  Computations are troubled (mathematically, uncorrelated predictor variables are necessary to obtain a solution for the regression equation.

Multicollinearity (also called collinearity): the problem that exists when “two or more independent variables are highly correlated; this makes it difficult if not impossible to determine their separate effects on the dependent variable” (Vogt, 2005, p. 198).

  1. Elements of the Model

1.   The intercept

--the value of the dependent variable Y when all values of the independent variables are equal to 0. It represents the point where the line of best fit crosses the y-axis.

--This value sometimes is expressed with the symbols c (for constant) or a.

--The intercept is identified as β0 in the multiple regression model and the multiple regression equation.

--The intercept is identified as b0 in the estimated multiple regression equation.

Intercept: “in multiple regression, the y intercept is the mean value of the dependent variable for a case with a value of zero on all the independent variables” (Vogt, 2005, p. 155).

 

2.   Regression weights

--the estimates of the contributions made by each of the predictor variables 

Unstandardized regression weight: symbolized by b, “an estimate of the change in y corresponding to a 1-unit change in xi when all other independent variables are held constant” (D. R. Anderson et al., p. 652).

--Adding all the level-importance values to the intercept value produces the dependent variable mean.

level importance: a measure computed by multiplying the regression coefficient by the mean of the predictor variable

--Beta weights are regression coefficients that have been standardized so as to represent data from a distribution with a mean of zero and a standard deviation of 1.

--When all the variables are from the same measurement scale (for instance, if each of the variables is a 5-point scale), interpreting the unstandardized regression coefficient is the most direct approach. As general advice, interpreting unstandardized regression coefficients is preferred “when: 1. you want to make a definitive prediction (e.g., the dollars of a person’s salary or someone’s GRE score) or 2. when you want to compare two groups (e.g., predicting the salaries for men and women in two separate regression equations)” (Losh, 2003, ¶¶ 58–60).

--When variables are measured on different ranges of scores, it makes sense to standardize scores to create comparable measures.

3.   The Correlation Coefficient

beta weights (sometimes called standardized regression

coefficients or standardized partial regression coefficients): values that indicate “the difference in a dependent variable associated with an increase (or decrease) of one standard deviation in an independent  variable—when controlling for the effects of other independent variables” (Vogt, 2005, pp. 23–24).

 

 

--Unlike the bivariate correlation coefficient, which can range from –1.0 to +1.0, multiple correlation coefficients range from 0 to 1.0.

multiple correlation coefficient R: “the measure of association between a dependent variable and an optimal combination of two or more” independent variables (J. Cohen & Cohen, 1983, p. 86).

--R2 (similar to r2) is sometimes called the multiple coefficient of determination.

--The size of R could be reduced by: using variables that show little variability; using measures with low reliability; examining dependent and predictor variables that are part of a nonlinear relationship (because “multiple linear correlation” assumes that relationships are best approximated by a straight line, a curvilinear relationship could produce much smaller correlation coefficients than might be expected).

--“R and R2 typically overestimate their corresponding population values, especially with small samples” (Mertler & Vannatta, 2002, p. 177). The reason is that adding another predictor variable naturally increases the size of R as an artifact, even though the increase is not meaningful. Indeed, the expected R2 is , even if the actual multiple correlation coefficient is 0 (Morrison, 1976).

Multiple coefficient of determination: a report of the proportion of variance in the dependent variable that is explained by knowledge of the optimal combination of two or more predictor variables.

--One response to inflated R values is to correct the multiple correlation coefficient for shrinkage.

Yet, the correction for shrinkage

·  “does not indicate how well the derived equation will predict other samples from the sample population” (J. P. Stevens, 2002, p. 114). The correction for shrinkage provides “gross overestimates of cross-validity and should not be used as such” (Schmitt & Ployhart, 1999).

·  Makes very little difference with large sample sizes, with small samples, the correction has been known to reduce the R2 to a negative coefficient (a theoretically meaningless number).

--Another option to reduce inflated Rs is to avoid using automated “stepwise” and “forward selection” methods to select a final set of predictor variables. Such stepwise and forward selection approaches tend to inflate the overall multiple correlation coefficient because they select predictors based on the size of correlations with the dependent variable rather than selecting variables based on some theoretic rationale.

III. How to Do a Multiple Regression Correlation Study

  1. Select Predictor Variables

--desirable to use a small list of predictors because:

1.   it is generally accepted in empirical research that—all other things being equal—an explanation that involves few variables to predict effects is superior to one that involves many variables. As the number of independent variables increases, the required sample size also grows.

2.   multiple correlation models with many variables can prove difficult to interpret.

--To select independent variables, researchers should be guided by several criteria.

1.   variables should have theoretic significance;

2.   predictor variables should be derived from past research with the dependent variable;

3.   variables that are highly correlated with each other should be avoided

This last criterion involves the multicollinearity problem. When multicollinearity is high, the regression weights for whichever variables are entered first will account for the greatest proportion of variance. Thus, it becomes difficult to interpret the actual comparative importance of the predictor variables.

shrinkage, which attempts to eliminate influences of “error fitting” by taking into account sample size and the number of predictor variables. By doing so, the shrinkage formula attempts to identify the amount of variation in the dependent variable that “would be accounted for if we had derived the prediction equation in the population from which the sample was drawn” (J. P. Stevens, 2002, pp. 113–114) by suggesting the “tendency for the strength of prediction in a regression or correlation study to decrease in subsequent studies” (Vogt, 2005, p. 294).

1.      Continuous predictors

(traditional multiple regression correlation focuses on these sorts of variables.)

continuous predictors: variables measured on the interval (or quasiinterval) or ratio levels of measurement.

2.   Dummy or indicator predictors

 

dummy or indicator predictors: variables that are coded to represent categorical or qualitative elements

3.   Lagged predictors

(Though the dependent variable actually lags after the predictor variables, the independent variables often are labeled “lagged variables.”)

--Researchers have to be careful when using lagged variables.

·  There should be some rationale for the number of “lags” that are involved. Adding large numbers of lags requires increased sample sizes, which may not be practical.

·  Lagged variables tend to be highly correlated with each other. Thus, the interpretation of regression and beta weights must be done very carefully. To decide on the number of lags, Ott and Hildebrand (1983) suggest a modified process of “trial and error” (starting with a 1-month lag and computing the multiple R and then adding a 2-month lag--if the multiple R does not substantially increase and if the standard error of estimate, the use of further lagged variables may be omitted).

  1. Gather an Adequate Sample

--When the number of events is small, the observed R2 tends to increase. As the number of predictor variables approaches the number of events in the sample, the R2 also nears 1.0 even if the associations actually are nonexistent. Hence, researchers using multiple regression correlation with small samples may mislead themselves into thinking that they have identified substantial effects when, in fact, their large R coefficients are largely artifacts.

--One guideline advises researchers to have 104 events plus the number of independent variables if they wish to test regression coefficients (Tabachnick & Fidell, 2001, p. 117).

--Another popular rule of thumb is that a sample must include at least 15 events per predictor variable (J. P. Stevens, 2002, p. 143).

--The size of the population multiple correlation coefficient (ρ2) makes a difference. One study (C. Park & Dudycha, 1974) found that the sample size required to keep R2 from deviating from R2 corrected for shrinkage increases as the population ρ2 to be detected decreases. For instance, with four predictor variables, when the population ρ2 is .50, the required sample is 66 (which equals 16.5 events per predictor variable). Similarly, when the population ρ2 is .25, the required sample is 93 (which equals 23.25 events per predictor variable).

  1. Compute Multiple Regression Coefficients

1.   Computing R

--for two predictor variables, R2 may be computed as:

where

 is the proportion of variance in the dependent variable Y that is shared with the optimal combination of the independent variables 1 and 2,

 is the squared correlation (coefficient of determination) of the dependent variable Y with independent variable 1,

 is the squared correlation (coefficient of determination) of the dependent variable Y with independent variable 2, and

 is the squared correlation (coefficient of determination) of the two independent variables with each other.

--To test if the observed multiple correlation coefficient is different from zero, the researcher computes a test of statistical significance using the F distribution (with degrees of freedom equal to the number of predictor variables for the numerator term, and the number of events in the study for the denominator term), the following formula is used:

where

R2 is the squared multiple correlation coefficient,

m is the number of predictor variables, and

n is the number of events in the study.

--The correlation coefficient also is corrected for shrinkage, which adjusts for the sample size and the number of predictors in the regression equation. The most popular formula is suggested by Wherry (1931; see Herzberg, 1969):

2.      Assessing Components of Multiple Correlations

a.   beta weights are computed from correlations as:

--beta weights are computed from regression coefficients as:

  where

b is the unstandardized regression coefficient and

 sx and sy are the standard deviations of the predictor variable and the dependent variable, respectively.

--To examine the statistical significance of the beta weights, the t test is enlisted to test the null hypothesis that the beta coefficient is not different from zero (H0: β1 = 0). The following formula is used:

where

sβ is the standard error of the beta, computed as:

where R2 is the squared multiple correlation coefficient and is the squared multiple correlation coefficient of the predictor variable with the other predictor variables.

--the t distribution is entered with n m – 1 degrees of freedom, where

n is the number of events in the study and

m is the number of predictor variables.

b.   unstandardized regression coefficients computed from beta weights as:

where

bX is the regression weight for the predictor variable x,

sY is the standard deviation for the dependent variable y, and

sX is the standard deviation for the predictor variable x.

3.      Deciding on the Method to Select Entry of Variables

--Underlying theories and hypotheses should affect the decision about the order in which variables should be entered.

lagged predictors: are measures at one time that predict delayed dependent variable values at another time.

 

a.       Direct entry

--Arguably, the greatest benefit of this approach is that direct entry methods make the researcher responsible for choices about the variables to examine.

Direct entry: in multiple regression correlation, a method of variable entry that involves entering all variables into the multiple regression equation.

b.      Automatic selection methods:

--In each of these cases, the researcher specifies the criteria for forward selection (usually p < .05) and for backward elimination (usually p > .10).

automatic selection methods: in multiple regression correlation, a category of variable entry tools that permit predictor variables to be “sifted” based on statistical testing completed to include or remove predictors.

1.)  Stepwise selection

--This method is actually a set of methods that uses statistical significance testing to select variables that should be included in successive steps.

--Challenges have been raised to the casual use of stepwise selection methods.

·  Because stepwise entry uses statistical methods to maximize the size of multiple correlation coefficients, the overall Rs may overestimate the sizes of effects in the population.

·  With modest sample sizes, the use of stepwise selection will tend to capitalize on chance and sampling error (McIntyre, Montgomery, Srinivasan, & Weitz, 1983; Thompson, 1995).

·  In many cases stepwise methods will “not correctly identify the best variable set of a given size” (Thompson, 1995, p. 525). The method emphasizes selecting variables that produce significant contributions separately, rather than in combination or interactions with other variables. Furthermore, one study determined that “the number of authentic variables found in the final model subsets was always less than half the number of available authentic predictor variables” (Derksen & Keselman, 1992).

stepwise selection: a set of variable entry methods in which each predictor variable are subjected to a significance test, and the one with the smallest probability value is selected for the model. In subsequent steps, other variables are added if they meet the same criterion. At the same time, previously included variables that no longer are statistically significant predictors are removed.

2.)    Forward selection

--In this method, once an effect is entered, it is not excluded.

 

Forward selection: a  variable entry method in which the predictor variable with the highest absolute correlation with the dependent variable is entered first. In successive steps, the predictor with the highest remaining statistically significant partial correlation with the dependent variable is entered.

3.)    Backward elimination

--Researchers usually find that the backward elimination method is most consistent with their desires to develop models and to reduce sources of error in prediction. With this method, the first entry may be controlled by the researcher based on an understanding of theory and past research.

backward elimination: a  variable entry method in which the all predictor variables are entered in one block and in subsequent steps variables are removed that have the smallest partial correlations with the dependent variable.

D.  Using SPSS and Excel for Multiple Regression Correlation

1.  SPSS

2.  Excel

E.   Check on the Adequacy of the Regression Model

1.   Analysis of residuals

 

--By looking at plots and examining residuals, it is possible to check the assumption that residuals are normally distributed. This characteristic is a reflection of the assumption of homoscedasticity. When this assumption is met, residuals between predicted and actual values should be randomly distributed and uncorrelated. If homoscedasticity cannot be assumed, “conventionally computed confidence intervals and conventional t-tests for OLS [ordinary least squares] estimators can no longer be justified” (Berry, 1993, p. 81).

--There may be several causes for this difficulty including:

·  outliers in the data that have thrown off constant error variance;

·  subjects-by-conditions interactions created by outside variables that are introducing nonrandom variation into the data (in other words, there may be an important variable that has been left out of the model);

·  strong skew in some predictor variables.

--To identify this problem, residual plots are examined.

homoscedasticity, which means that variability in scores of one variable is stable through the entire range of the other variable and is homogeneous at all points along the line of best fit (line of regression).

To detect outliers in reference to the vertical axis (that is, higher or lower than predicted), the Weisberg test may be used (though extensions permit its use in other ways as well).

where

ri is the standardized residual value (sometimes called Pearson residuals) for the case,

p′ is the number of regression coefficients including the intercept, and

n is the number of events in the sample.

Weisberg’s ti:  a test of the statistical significance of standardized residuals

--To detect outliers in reference to the horizontal axis (that is, further to the right or left than predicted), the most popular tools combine use of Mahalanobis’ Distance and Cook’s D.

 

Mahalanobis’ Distance (symbolized as D2):  a test of the statistical significance of the distance of scores of a case from the centroid

Centroid:  the weighted mean of all the dimensions defined by the predictor variables.

Cook’s D: a statistic that identifies the influence of suspected outliers on regression coefficients by

examining regression coefficients that would exist if the outlier were deleted.

--When faced with outliers, researchers have two major options:

·  Transformations of the data to can bring outliers closer to the centroids. Square root, logarithmic, and reciprocal transformations often can reduce the effects of outliers.

·  Removal of outliers may be attempted.

--the process does not always work neatly. Sometimes, researchers can remove outliers only to find that the new calculations show yet other outliers.

 

3.      Check on Autocorrelation

--Autocorrelation tends to be reflected in cyclical activity when plots are examined. Because an assumption of multiple correlation is that the errors in prediction are independent, highly correlated errors are cause for concern.

--Autocorrelation often occurs in time-series analyses, where effects of some variables continue to be observed over time, even though the independent variables may change.

--Because the standard errors become smaller than they ordinarily would be, the overall multiple R coefficients tend to be inflated.

a.   types of autocorrelation:

Autocorrelation (sometimes called “serial correlation”): “a correlation of the values of a variable with the values of the same variable lagged one or more time periods back” (Aczel, 1989, p. 571).

 

·  Positive

--In this arrangement, the groups of positive and negative residuals tend to be in groups across time.

Positive autocorrelation: a form of autocorrelation that involves a positive or negative residual to be followed by another of the same sign.

·  Negative autocorrelation

--In this arrangement, over time a positive residual tends to be followed by a negative residual, and a negative residual tends to be followed in time by a positive residual.

Negative autocorrelation: a form of autocorrelation that involves following a positive or negative residual with another of a different sign.

b.   test for autocorrelation—the Durbin-Watson test

--This test examines the null hypothesis H0: ρ = 0, that there is no autocorrelation. In setting alpha risk, the researcher must decide whether to use a directional (one-sided) or nondirectional (two-sided) test. If one wishes to test for either positive or negative autocorrelation, a one-sided test may be used. But if the researcher wishes to know only if there is some kind of autocorrelation, a nondirectional test may be completed.

--The Durbin-Watson test uses the statistic:

where

e is the residual,

t is the time of the observation (thus, et is the residual at a given time, and et−1 is the residual at the previous time), and

n is the number of events.

·  A directional test is interpreted by examination of test statistics.

--A statistically significant negative autocorrelation is claimed if the Durbin-Watson test statistic is greater than 4 minus the lower limit of d (dL).

--When the Durbin-Watson test statistic is between 4 – dL (dL represents the lower limit of d) and 4 – dU (dU represents the upper limit of d), the test is inconclusive.

--When the Durbin-Watson test statistic d is greater than 4 – dL , the researcher concludes that there is evidence of negative first-order error autocorrelation. When d is between
4 – dU and 4 – dL, the test is inconclusive, and when d is below
4 – dU (dU represents the upper limit of d), there is no evidence of negative first-order autocorrelation of the errors.

·  nondirectional test examining evidence for a positive autocorrelation, the null hypothesis is rejected if the Durbin-Watson test coefficient is either:

• Below 4 – dU (indicating positive autocorrelation) or

• Above 4 – dL (indicating negative autocorrelation).

Durbin-Watson test: a test of statistical significance for autocorrelation.

3.   Check on Multicollinearity

a.   collinearity statistics:

·  Tolerance

--computed by taking 1 – R2, with the individual variable treated as the dependent variable and the other variables treated as predictor variables.

--A tolerance approaching zero indicates a problem with multicollinearity, because it means that the variable in question contributes very little unique information to the overall model.

 

 

Tolerance: a collinearity statistic that identifies “the proportion of the variability in one independent variable not explained by the other independent variables” (Vogt, 2005, p. 325).

·  VIF or the variance inflation factor

--computed by taking ,

--Large VIF coefficients indicate that the regression coefficient variance is increasing, suggesting instability associated with multicollinearity problems.  

VIF or variance inflation factor: a collinearity statistic that divides 1 by the tolerance, such that severe multicollinearity is indicated by values that approach 100.

·  Condition index

--computed as the square root of the largest eigenvalue by each of the smaller eigenvalues

--Though there is no test of statistical significance, a general rule of thumb is that a condition index above 15 indicates possible multicollinearity, and a condition index above 30 indicates severe multicollinearity.

Eigenvalues: statistics used “to indicate how much of the variation in the original group of variables is accounted for by a particular factor” (Vogt, 2005, pp. 103–104).

Condition index: a collinearity statistic that is computed as a ratio of the square root of the largest eigenvalue to each smaller eigenvalue

b.   Haitovsky’s test

--A nonsingular matrix of intercorrelations is one that shows relatively low intercorrelations among the predictor variables. A nonsingular matrix has a determinant that is close to 1.0. Haitovsky’s test examines whether the matrix is singular due to many high interecorrelations among predictor variables.

--A statistically significant difference indicates that multicollinearity is not a problem because the correlation matrix of predictor variables is not singular.

--Computed as:

 

where

p is the number of predictor variables,

N is the number of elements in the sample, and

|XTX| is the determinant of the correlation matrix of predictor variables.

The degrees of freedom to enter the chi-square distribution are computed by

              

c.   Dealing with multicollinearity

--“Collinearity does not affect the ability of a regression equation to predict the response. It poses a real problem if the purpose of the study is to estimate the contributions of individual predictors” (Dallal, 2001, ¶ 9).

--When faced with outliers, researchers have several options:

·  Researchers normally attempt some form of data reduction, such as factor analysis;

·  Increasing the sample size since when sample size is increased, the standard error decreases, partially offsetting the problem that high multicollinearity leads to high standard errors of the b and beta coefficients.

Haitovsky’s test:  a test for multicollinearity that examines the null hypothesis that the matrix of correlations among predictor variables is singular with a determinant of zero

determinant: in matrix operations a value that  “represents the generalized variance for several variables. That is, it characterizes in a single number how much variability is present on a set of variables” (J. P. Stevens, 2002, p. 64).

 

·  Use centering: transform the predictor variables

·  Substitute crossproducts of intercorrelatoned variables as an interaction term, or in some other way combine the intercorrelated variables as part of a respecified model.

·  “Leave one intercorrelated variable as is but then remove the variance in its covariates by regressing them on that variable and using the residuals.

·  Assign the common variance to each of the covariates by some probably arbitrary procedure.

·  Treat the common variance as a separate variable and decontaminate each covariate by regressing them on the others and using the residuals. That is, analyze the common variance as a separate variable. . . “ (Garson (2003, ¶ 95).

·  Use ridge regression.  

centering: a data  transformation that  subtracts the mean

from each case.