Ch3a Partial Correlation

Subchapter 3a.
Partial Correlation

Suppose you were to measure each of N subjects on each of three variables, X, Y, and Z, and find the following correlations:


X versus Y:	r_XY = +.50	r²_XY = .25
X versus Z:	r_XZ = +.50	r²_XZ = .25
Y versus Z:	r_YZ = +.50	r²_YZ = .25

For the moment, focus on the value of r², which in each case (for this streamlined hypothetical example) is equal to .25. What this means is that for each pair of variables—XY, XZ, and YZ—the covariance, or variance overlap, is 25%. As illustrated in the following diagram, 25% of the variability of X overlaps with variability in Y; 25% of the variability of X overlaps with variability in Z; and 25% of the variability of Y also overlaps with variability in Z.

Note as well that there is one region where all three of the variability circles overlap. The meaning of this three-way overlap is that a certain amount of the correlation found between any two of the variables is tied in with the correlation that each of those two has with the third. Thus, of the 25% variance overlap found between X and Y, approximately half (judging by the naked eye) is tied in with the overlaps that exist between XZ and YZ. Similarly for the 25% overlap between X and Z, where about half is bound up with the overlaps for XY and YZ. And similarly as well for the 25% overlap of YZ, where about half is tied up with the overlaps for XY and XZ.

Partial correlation is a procedure that allows us to measure the region of three-way overlap precisely, and then to remove it from the picture in order to determine what the correlation between any two of the variables would be (hypothetically) if they were not each correlated with the third variable. Alternatively, you can say that partial correlation allows us to determine what the correlation between any two of the variables would be (hypothetically) if the third variable were held constant. The partial correlation of X and Y, with the effects of Z removed (or held constant), would be given by the formula

r_XY·Z =

r_XY—(r_XZ)(r_YZ)

sqrt[1—r²_XZ] x sqrt[1—r²_YZ]

which for the present example would work out as

	r_XY·Z =	.50—(.50)(.50) sqrt[1—.25] x sqrt[1—.25]

	r_XY·Z =	+.33

		Hence r²_XY·Z = .11

The same general structure would apply for calculating the partial correlation of X and Z, with the effects of Y removed:

r_XZ·Y =

r_XZ—(r_XY)(r_YZ)

sqrt[1—r²_XY] x sqrt[1—r²_YZ]

and for for calculating the partial correlation of Y and Z, with the effects of X removed:

r_YZ·X =

r_YZ—(r_XY)(r_XZ)

sqrt[1—r²_XY] x sqrt[1—r²_XZ]

Here is the apparatus of partial correlation applied to a real-life example. The Wechsler Adult Intelligence Scale (WAIS) is a device often used to measure "intelligence" beyond the years of childhood. Among its several sub-scales are three labeled as C, A, and V. The "C" stands for "comprehension," which chiefly reflects the test-taker's ability to comprehend the meanings and implications of written passages. The "A" refers to the test-taker's ability to perform tasks that require arithmétic ability. And the "V" stands for "vocabulary," which as you might imagine is a measure that increases or decreases in accordance with the breadth of the test-taker's vocabulary within the domain of the language in which the test is constructed. The following table shows the correlations typically found among these three sub-scales.


C versus A:	r_CA = +.49	r²_CA = .24
C versus V:	r_CV = +.73	r²_CV = .53
A versus V:	r_AV = +.59	r²_AV = .35

Here the overlaps are less evenly proportioned, although the logic is quite the same. Of the 24% variance overlap that occurs in the relationship between comprehension and arithmétic ability, a substantial portion reflects the fact that both of these variables are correlated with vocabulary. If we were to remove the effects of vocabulary from the relationship between C and A, the resulting partial correlation would be

r_CA·V =

r_CA—(r_CV)(r_AV)

sqrt[1—r²_CV] x sqrt[1—r²_AV]


	r_CA·V =	.49—(.73)(.59) sqrt[1—.53] x sqrt[1—.35]

	r_CA·V =	+.11

		Hence r²_CA·V = .01

In brief: with the effects of vocabulary removed, the correlation between comprehension and arithmétic ability collapses down to hardly anything at all. The practical inference is that if we were to administer the WAIS to a sample of subjects who were homogeneous with respect to breadth of vocabulary, the correlation between their scores on the comprehension and arithmétic sub-scales would prove fairly scant, on the order of r=+.11 and r²=.01.

In most cases a partial correlation of the general form r_XY·Z will turn out smaller than the original correlation r_XY. In those cases where it turns out larger, the third variable, Z, is typically spoken of as a supressor variable on the assumption that it is suppressing the larger correlation that would appear between X and Y if Z were held constant.

Suppose, for example, that a rather cranky professor has just administered an exam in his statistics course, and that for each student in the course we have measures on each of the following three variables:

	X =	the amount of effort spent on studying for the exam beforehand
	Y =	the student's score on the exam
	Z =	a measure of the degree to which the professor inspires fear and trembling in the student

And here are the correlations among the three variables:


X versus Y:	r_XY = +.20	r²_XY = .04
X versus Z:	r_XZ = +.80	r²_XZ = .64
Y versus Z:	r_YZ = —.40	r²_YZ = .16

Now isn't it odd that the correlation between X and Y should end up as a scant r_XY=+.20 and r²_XY=.04, indicating a mere 4% covariance between the degrees of effort that students put into to the exam and the scores that they receive on it? Examine the other two correlations, however, and you will see that it is not so odd after all. The greater the fear and trembling, the greater the effort that students tend to put into preparing for the exam; hence r_XZ=+.80 and r²_XZ=.64. On the other hand, the greater the fear and trembling, the less well students tend to do on the exam, as witness r_YZ=—.40 and r²_YZ=.16. Remove the supressing effects of fear and trembling from the equation,

r_XY·Z =

.20—(.80)(—.40)

sqrt[1—.64] x sqrt[1—.16]

r_XY=.20 r²_XY=.04
r_XY·Z=.95 r²_XY·Z=.90

r_XY·Z =

+.95

and the correlation between effort and exam score goes from a scant r_XY=+.20 to an impressive r_XY·Z=+.95. Or alternatively: remove the fear and trembling, and the covariance between effort and exam score goes from a mere 4% to a very substantial 90% (r²_XY·Z=.90).

The VassarStats computational site includes a page that will calculate the partial correlation coefficients for any particular set of three intercorrelated variables.

End of Subchapter 3a.
Return to Top of Subchapter 3a
Go to Subchapter 3b [Rank-Order Correlation]
Go to Chapter 4 [A First Glance at the Question of Statistical Significance]

Home

Click this link only if the present page does not appear in a frameset headed by the logo Concepts and Applications of Inferential Statistics