UP

Chapter 14:

Extensions of Multiple Regression Correlation

 

I.    Using Categorical Predictors

--Although multiple regression correlation typically uses predictor variables that are continuous measures, the method also may employ categorical predictors.

 

A.      Dummy or Indicator Predictors

--For two conditions the control level would be coded as 0 and the treatment level would be coded as 1.

--If there are three or more nominal categories involved, dummy variable coding typically adds variables (coded 0 or 1) to represent the presence or absence of additional categories. Typically. the first variable would be coded as 1 if participants are exposed to the treatment and 0 otherwise. A second new variable similarly would be coded as 1 if participants are exposed to statistical evidence and 0 otherwise. The third condition is not represented by a third dummy variable because there can only be g – 1 (groups minus 1) aspects to test. the third category has not disappeared at all. The third category is the one condition receiving 0 on all the dummy variables.

--Other advice on dummy variable coding of is given:

·  If there is a true control group, it is recommended that this condition be the last group in the coding.

·  It is usually desirable to choose an omitted category with a fairly large number of cases. If the omitted category has a small number of cases, the coefficients for the included categories will have large standard errors.

·  “When group g has been assigned 0’s arbitrarily, it is usually appropriate to run an additional regression analysis in which group g is now assigned 1’s in a new dummy vector to obtain the relevant statistics and tests for group g” (J. J. Stevens, 1999, ¶ 8).

--The regression weight b0 for the intercept or constant is the dependent variable mean for all dummy variables coded as 0.

--Each regression weight reports the comparison of the group coded as 1 with the group coded as 0 on all dummy variables (regardless of coefficients of variables carrying interactions). A positive value for the regression coefficient b means that the group scores are higher than those for

--the group coded as 0 on all dummy variables. A negative value for the regression coefficient means that the group scores are lower than the group coded as 0 on all dummy variables.

--It should be understood that these interpretations apply to situations where the researcher has used only one independent variable whose levels have been dummy variable coded into separate variables. When there are other predictor variables in the multiple correlation equation, b0 includes the value of the dependent variable when all predictor variables are equal to zero.

dummy or indicator predictors:  in multiple regression correlation, the use coding of categorical variables to represent nominal or qualitative variables

B.     Effects Coding

--To effects code the levels of the categorical variable representing different groups, k – 1 (with k as the number of categories) new variables are created, such as:

 

X1

X2

X3

X4

Variable A

1

0

0

0

Variable B

0

1

0

0

Variable C

0

0

1

0

Variable D

0

0

0

1

Variable E

-1

-1

-1

-1

The sum of effects for each column is zero. Similar to dummy variable coding, the last group is included as the group receiving the effects code “–1” on all the variables.

--The regression coefficient b0 for the intercept identifies the grand mean.

--A significant effect for variable X1 indicates a primary difference between dependent variables scores from Variable A on one hand, and Variable E on the other, while the influences of the three remaining variables are minimized.

--The effects-coded variable X2 primarily distinguishes between scores from Variable B and Variable E, while the influences of the three remaining variables are minimized. And so it goes.

--Because the regression coefficient for the intercept identifies the grand mean, other regression coefficients indicate the differences between one group when compared with the collection of all the others. In contrast, dummy variable coding suggests the influence of the presence or absence of a variable level in contrast to a single control condition.

--When analyses are completed, researchers examine t tests of regression weights to see if a given mean has a statistically significant difference from the grand mean. The direction of regression weights reveals something about the influence of a group or condition. For any group represented by an X variable, a positive regression coefficient indicates that the group’s mean is higher than the grand mean of all groups.

effects coding (sometimes called deviation coding): a form of variable coding that identifies groups of participants on new categorical variables, rather than using dummy variable coding. Effects coding uses values beyond 0 and 1 to code two or, most often, more than two categories.

 

C.     Contrast Coding

--Though the logic may be used to contrast two groups, contrast coding is particularly useful in situations where a researcher has more than two groups, such as when an experimenter exposes participants to messages from lowly, moderately, and highly credible sources.

--Although there is more than one way to compose contrast coefficients, when a predictor variable has more than two categories representing an underlying continuum, the values assigned to these conditions often may be drawn conveniently from the table of orthogonal polynomials

--Unlike dummy variable coding, where 0 and 1 are used, effects coding places no value between conditions coded –1 and +1.

--Since the same groups are repeatedly used to make comparisons, the chances of Type I error are increased beyond the individual test alpha risk set by the researcher. When the hypothesized contrasts are not orthogonal, many (Bernstein, 1988, p. 135) suggest making some adjustment of the alpha risk used for each test of significance of beta weights.

--When using contrast coefficients, orthogonality also depends on equal sample sizes. If the sample sizes are unequal, a computational factor that provides weights for different sample sizes actually diminishes orthogonality further (G.Wolf & Cartwright, 1974).

--The researcher inspects the regression and the beta coefficients.

·  The intercept is the weighted mean of scores on the dependent variable.

·  The sizes of the regression weights indicate the influence of the contrasts in the newly created variables.

·  The t tests of the beta weights indicate the statistical significance of the difference of the means of the two linear combinations involved in the contrasts for each hypothesis test.

II.  Contrasting Full and Reduced Models: Hierarchical Analysis

--Researchers often want to know if the overall R changes when they add key variables, including those variables that contain interaction of curvilinear effects.

contrast coding (sometimes called “orthogonal contrast coding” or just “orthogonal coding”): a form of variable coding that “compares one linear combination of groups with a second linear combination of groups”

(Bernstein, 1988, p. 126).

·        hierarchical analysis

--For instance, a researcher might enter a set of variables (sometimes a called “block”) and look at the R2 that results. By comparing the R2s, the researcher would be able to determine if a main effect or interaction effect explanation were the most useful way to understand the data.

 

hierarchical analysis, an application of multiple regression correlation in which “independent variables are entered into the regression

equation in a sequence specified by the researcher in advance” (Vogt, 2005, p. 142).

--To see if the two correlations are different beyond the limits of random sampling error, researchers test a null hypothesis that the population multiple correlation for a model with a large number of predictors is equal to the population multiple correlation for a model with fewer predictors: H0: .  This null hypothesis is appropriate when the number of predictors m2 is greater than the number of predictors m1.

--To test the statistical significance of the difference between differing models, the following formula is used:

  

where:

is the larger of the two correlations, with the larger number of predictors (m2) in the comparison (in essence, the full[er] model);

is the smaller of the two correlations, with the smaller number of predictors (m1) in the comparison (in essence, the partial model); and

n is the number of events.

Degrees of freedom for the numerator are equal to m2m1 (the difference between the number of predictors in the two compared multiple regression correlations); and degrees of freedom for the denominator are equal to n m2 –1.

--If the test statistic F is greater than the critical value at a given alpha risk, the difference between the two Rs is judge to be greater than would have been expected as a result of random sampling error.

 

III. Interaction Effects

--Because multiple linear regression correlation does not deal with interaction terms directly, researchers may create additional variables that carry the interactions. Then, they enter the newly created variables into the multiple correlation equation to test if R values show an increase.

interaction effects: the influences of variables taken in combinations.

--Sometimes these interactions are considered indications of moderator effects.

--As a matter of language, researchers say that an interaction between two variables u and v “is ‘carried by,’ not ‘is’ the uv product” because “[o]nly  and v have been linearly partialled from uv does it, in general, become the interaction IV V  [independent variable] we seek” (J. Cohen & Cohen, 1983, p. 305). Similar language is used when describing nonlinear effects.

--As an illustration, interaction effects may be included in multiple regression with two predictor variables as illustrated: Y = β0 + β1X1 + β2X2 + β3X1X2 + ε.

The term carrying the interaction source of variation is identified as β3X1X2,which is the last set of terms in the model before the error term ε.

--As can be seen, interactions are examined in multiple correlation by including a new variable that is the product of the interacting variables. In this case, a third variable is created by multiplying Variable 1 by Variable 2 (X1*X2).

·  Many, but certainly not all, scholars suggest the wisdom of standardizing scores first so that a zero point may be included.

moderator effects: an “interacting third variable which changes the relation between two original variables is a moderator variable which moderates the original relationship” (Garson, 2003, ¶ 10).

--You will notice that when the interaction is included, the full model is presented, including both the main effects and the interaction effect. Eventually, the impact of any interaction effect is detected by comparing Rs from multiple regression equations that include the interaction to others that do not.

--Some other options have been suggested to examine interactions (see Jaccard & Turrisi, 2003). A predictor variable could be classified into two groups based on a mean or median split (in the case of continuous variables) or on the basis of dichotomies from dummy or effects coding. Then, separate multiple regression correlation analyses could be completed on separate samples that were each restricted to one group level. The unstandardized regression weights represent slope. Hence, a comparison of the differences in slope would indicate the presence or absence of an interaction. A statistically significant difference would indicate an interaction.

--A warning is appropriate: thoughtlessly adding interaction (and even nonlinear) effects can lead to capitalizing on chance and “overfitting” multiple regression correlation models.

 

A.  Creating Interaction Terms

1. If both individual predictor variables are continuous measures, the term carrying the interaction is simply the product of the two predictor variables. A full multiple regression model including two main additive effects (β1X1 and β2X2) and an interaction between two continuous or quantitative measures (β3X1X2) is shown as Y = β0 + β1X1 + β2X2 + β3X1X2 + ε. In this case, a third predictor variable carrying the interaction is created by multiplying Variable 1 by Variable 2 (X1 * X2). The interaction effect is detected by comparing the multiple regression correlation equation that includes the interaction with an equation that excludes the interaction term. A statistically significant difference in R values is taken as evidence of a significant interaction effect.

 

2. If the predictors carrying the interaction are both dummy or effects-coded variables representing simple dichotomous predictor variables, a “new” variable (such as X3) also is created. The X3 variable carrying the interaction is simply the cross product of the two indicator variables in the interaction. That is, its values are the products of the X1 and X2 dummy variable values. When more than two categories for the independent variables are subjected to dummy variable coding, interpretations become increasingly complex, and researchers are directed to other sources for details (J. Cohen & Cohen, 1983, esp. chap. 8, and Kelly et al., 1969, esp. chap. 6). When the predictor carrying the interaction is added to the multiple regression correlation equation, a statistically significant interaction effect is identified by observing an increase in the size of R.

 

3. If the two variables are contrast-coded predictors and the contrast coefficients are drawn from the table of orthogonal polynomials, the task of identifying interactions is simplified. By using the orthogonal polynomials, a term carrying the interaction is simply the product of the contrast coefficients from each level of the predictors involved in this interaction.

For instance, in a situation where two predictors have three levels (a low, moderate, and high level), the linear orthogonal polynomials for each are –1, 0, and +1. The following arrangement would be involved:

 

 

X1 (low)

X1 (moderate)

X1 (high)

 

Contrast Coefficient

-1

0

+1

X2 (low)

-1

1

0

-1

X2 (moderate)

0

0

0

0

X2 (high)

+1

-1

0

+1

A new variable (X3) carrying the interaction variable would be created. Its values would be the products of the contrast coefficients. For instance, the value corresponding to the combination of X1 at its low level and X2 at its low level would be equal to 1 (X1[low] with a value of 1 times X2[low] with a value of 1 equals 1). In comparisons of multiple regression correlation equations with and without the variable carrying the interaction, the researcher would detect whether interaction effects are present by observing statistically significant changes in R values.

 

4.      When one variable is continuous and the other is a dummy or effects-coded variable, the process of coding an interaction term must be done cautiously. The interaction would indicate different patterns from each level of the qualitative variable, not necessarily a statistically significant effect from degrees of the two predictor variables. In essence, the researcher describes different slopes at differently coded levels. The researcher multiplies the two variables involved in the interaction to create an indicator variable to carry in the interaction effect.

--In essence, this analysis explores whether the slope of the regression line is different when the dummy-coded variable is equal to 0 (or –1 for effects-coded variables) or when it is equal to 1. By examining different multiple regression correlations, the researcher identifies differences in slopes to understand the nature of interactions. With two predictor variables (one continuous and one dummy coded) and one indicator variable carrying the interaction (adapted from Aczel, 1989, pp. 537–538), the following interpretations are involved:

·  If the regression coefficients are all nonzero, there are two different lines of best fit with different slopes and different intercepts.

·  If β2 is zero, there are two lines with the same intercept but a different slope.

·  If β3 is zero, there is no interaction and the two lines are parallel.

·  If β1 is not different from zero, there is no regression

 

5.  If the two interacting predictor variables include a contrast-coded predictor and a continuous measure, there are different pieces of advice about coding. The contrast coefficients may be multiplied by the scores on the continuous measures. These continuous measures may take the form of

·  raw scores,

·  differences of the raw scores from their own means,

·  or standard scores (z scores) of the continuous variable.

Though other values are affected, these three methods tend to produce similar R and mean square residual values.

 

B.  Testing Interactions

To examine the size of the multiple R if an interaction exists, researchers find it most useful to compare two or more multiple regression correlation equations (depending on the number of interaction effects to be examined). The first equation contains only the predictor variables without the key interactions and subsequent equations include variables including those carrying the interaction effect.

·        Researchers examine residuals to determine if which models produce reductions in this value.

·        Researchers examine the statistical significance of the differences in R values for models excluding and including the variables carrying the interaction effects.

 

To test the significance of the difference between these two R values, the previously described formula is used:

  

where:

is the larger of the two correlations, with the larger number of predictors (m2) in the comparison (in essence, the full[er] model);

is the smaller of the two correlations, with the smaller number of predictors (m1) in the comparison (in essence, the partial model); and

n is the number of events.

Degrees of freedom for the numerator are equal to m2m1 (the difference between the number of predictors in the two compared multiple regression correlations); and degrees of freedom for the denominator are equal to n m2 –1.

 

When dummy and effects-coded variables are involved, t tests of individual regression weights should not be used.

 

IV.  Examining Nonlinear Effects

A.  Identifying Nonlinear Patterns

Multiple regression correlation may be enlisted to provide a helpful alternative. Another predictor variable representing a curved (in this case, quadratic) function may be created and added to the formula. Then, a new correlation may be computed, including the linear and the nonlinear effects.

--As an example, whereas the equation for a bivariate correction is Y = b0 + b1X1, a nonlinear effect may be represented by Y = b0 + b1X1 + b2where a second variable is included that is the square of the first predictor variable. Thus, a quadratic function of the independent variable is advanced to see if it adds a superior fit to the data.

--Even without a plot, inspecting the intercept and the regression weights can reveal the nature of a relationship. For a study of a quadratic effect:

·  When b2 is positive the curve has a U shape, but when b2 is negative, the curve has an inverted U shape.

·  The greater the size of b2, the “more severe” is the curve.

·  When b1 and b2 have the same sign, the curve is “displaced to the left. [With] . . . unlike signs the displacement is to the right.”

·  When b1 is equal to “0, the curve will be symmetrical around” the y-axis.

·  When the intercept is positive, the line of best fit crosses the y-axis above the horizontal axis.

·  When the intercept is negative, the line of best fit crosses the y-axis below the horizontal axis.

·  When the intercept is zero, the line of best fit crosses the y-axis at the horizontal axis.

 

--To test the significance of the difference between these two R values, the previously described formula is used:

  

 

--When using such a method, the researcher is not interested in the comparative influence of the quadratic or linear components. The important thing is to know if the overall R increases when variables representing curvilinear influences are added with the linear effect. Because the squared value of a variable is highly correlated with the original variable, there is high multicollinearity among predictor variables when variables carrying nonlinear trends are included. Thus, interpreting beta weights is problematic. Nevertheless, the R value would not be affected.

 

B.  Curve Fitting Through SPSS