One-Way ANOVA: Correlated Samples: II

Chapter 15.
One-Way Analysis of Variance for Correlated Samples
Part 2

From the example introduced in Chapter 15, Part 1:

A [0cps]	B [2cps]	C [6cps]	All groups combined
N_a=18 -∑X_ai=466 -∑X²_ai= 12800 M_a=25.9 SS_a=735.8	N_b=18 -∑X_bi=485 -∑X²_bi= 14021 M_b=26.9 SS_b=952.9	N_c=18 -∑X_ci=421 -∑X²_ci= 10443 M_c=23.4 SS_c=596.3	N_T=54 -∑X_Ti=1372 -∑X²_Ti= 37264 M_T=25.4 SS_T=2405.0

SS_T = 2405.0 SS_wg =^TSS_a+SS_b+SS_c = 735.8+952.9+596.3 = 2285.0 SS_bg =^TSS_T—SS_wg = 2405.0—2285.0 = 120.0

¶Identification and Removal of SS_subj

The first five columns of the following table are a somewhat streamlined version of the data table you saw in Part 1. The means for the individual subjects are now highlighted in blue, and next to each subject's mean, in a new column, is a calculation whose general form you will surely recognize from Chapter 14. The difference between each subject's mean and the mean of the total array of data is taken as a deviate:


	deviate = M_subj* — M_T		We will use the subscript "subj*" to mean "any particular subject." The first subject would be subj1; the second, subj2; and so on.

The square of that difference is accordingly a squared deviate:


	squared deviate = (M_subj* — M_T)²

And that squared deviate is then weighted by k=3, which is the number of measures (X_ai, X_bi, and X_ci) on which the subject's mean is based.


	weighted squared deviate = k(M_subj* — M_T)²

It is the same general structure we used in Chapter 14 to calculate the value of SS_bg from scratch, to measure the aggregate differences among the k groups. Now what we come out with is SS_subj, which is a measure of the aggregate differences among the 18 individual subjects.

subjects	A	B	C	Subject Means
²1 ²2 ²3 ²4 ²5 ²6 ²7 ²8 ²9 ²10 ²11 ²12 ²13 ²14 ²15 ²16 ²17 ²18	²35 ²32 ²33 ²32 ²31 ²29 ²29 ²27 ²27 ²28 ²27 ²27 ²24 ²24 ²17 ²17 ²14 ²13	²39 ²35 ²32 ²32 ²33 ²30 ²31 ²29 ²31 ²27 ²27 ²26 ²29 ²25 ²16 ²15 ²15 ²13	²32 ²31 ²28 ²29 ²26 ²29 ²27 ²27 ²24 ²24 ²23 ²23 ²19 ²19 ²18 ²17 ²12 ²13	²35.3 ²32.7 ²31.0 ²31.0 ²30.0 ²29.3 ²29.0 ²27.7 ²27.3 ²26.3 ²25.7 ²25.3 ²24.0 ²22.7 ²17.0 ²16.3 ²13.7 ²13.0	3(35.3—25.4)²=294 3(32.7—25.4)²=159.9 3(31.0—25.4)²=94.1 3(31.0—25.4)²=94.1 3(30.0—25.4)²=63.5 3(29.3—25.4)²=45.6 3(29.0—25.4)²=38.9 3(27.7—25.4)²=15.9 3(27.3—25.4)²=10.8 3(26.3—25.4)²=2.4 3(25.7—25.4)²=0.3 3(25.3—25.4)²=0 3(24.0—25.4)²=5.9 3(22.7—25.4)²=21.9 3(17.0—25.4)²=211.7 3(16.3—25.4)²=248.4 3(13.7—25.4)²=410.7 3(13.0—25.4)²=461.3
M_T=25.4					SS_subj=2179.4

I show you this particular method for calculating SS_subj only because it gives a clear idea of the conceptual structure of the measure. It would not be the method to use for practical computational purposes, as it is likely to end up with a considerable accumulation of rounding error. Also, if you are doing it by hand, it can be rather laborious. The following method is somewhat less laborious and will in any event minimize the risk of rounding errors. You will recognize it as analogous to the computational formula for SS_bg that we worked through in Part 1; namely

SS_bg

(∑X_ai)²

N_a

(∑X_bi)²

N_b

(∑X_ci)²

N_c

—

(∑X_Ti)²

N_T

Except now the string of items to the left of the minus sign is replaced by another string that looks like this:

(∑X_subj1)²

(∑X_subj2)²

(∑X_subj3)²

. . .

and so on, for as many subjects
as you have in the analysis.

That is: For each subject in the analysis, take the sum of that subject's scores across all k conditions, square that sum, and then divide the result by k, which is the number of scores on which the sum is based.

Since k is the same for each subject, this portion of the formula to the left of the minus sign can be rewritten as

(∑X_subj*)²

That is: For each subject in the analysis, take
the sum of that subject's scores across all k
conditions, square that sum, add up these
squared sums across all subjects, and then
divide the total of squared sums by k.

and the entire formula can then be written more compactly as

SS_subj

(∑X_subj*)²

—

(∑X_Ti)²

N_T

The next version of our data table shows the sum for each subject, along with the calculation of the square of each subject's sum.

subjects	A	B	C	Subject Sums	(∑X_subj*)²
²1 ²2 ²3 ²4 ²5 ²6 ²7 ²8 ²9 ²10 ²11 ²12 ²13 ²14 ²15 ²16 ²17 ²18	²35 ²32 ²33 ²32 ²31 ²29 ²29 ²27 ²27 ²28 ²27 ²27 ²24 ²24 ²17 ²17 ²14 ²13	²39 ²35 ²32 ²32 ²33 ²30 ²31 ²29 ²31 ²27 ²27 ²26 ²29 ²25 ²16 ²15 ²15 ²13	²32 ²31 ²28 ²29 ²26 ²29 ²27 ²27 ²24 ²24 ²23 ²23 ²19 ²19 ²18 ²17 ²12 ²13	²106 ²98 ²93 ²93 ²90 ²88 ²87 ²83 ²82 ²79 ²77 ²76 ²72 ²68 ²51 ²49 ²41 ²39	106²=11236 98²=9604 93²=8649 93²=8649 90²=8100 88²=7744 87²=7569 83²=6889 82²=6724 79²=6241 77²=5929 76²=5776 72²=5184 68²=4624 51²=2601 49²=2401 41²=1681 39²=1521
(∑X_subj*)²=111122

Recalling that ∑X_Ti=1372 and N_T=54, our calculation of SS_subj is accordingly

SS_subj	=	(∑X_subj*)² k	—	(∑X_Ti)² N_T

	=	111122 3	—	(1372)² 54

	=	2181.7

As always, one must be careful not to get so lost in the calculations as to lose sight of the concepts. What we have just calculated with SS_subj is the amount of variability within the total array of data that derives from individual differences among the subjects. And now that this amount has been identified, it can be removed. The process of removal is one of simple subtraction. The total amount of within-groups variability is SS_wg=2285.0. Of this amount, SS_subj=2181.7 is attributal not to sheer, cussed random variability, but to extraneous pre-existing individual differences among the 18 subjects. Remove the latter from the former, and what remains is

	SS_error	= SS_wg—SS_subj
		= 2285.0—2181.7
		= 103.3

Here is one of the same diagrams you saw in Part 1, except now the specific numerical values have been included for the several values of SS. The bottom line is that a huge chunk of SS_wg is attributable to individual differences among the subjects, and that chunk is now removed. From this point on, it is all a smooth ride across familiar terrain.

¶Calculation of MS_bg, MS_error, and F

We begin by sorting out the several components of degrees of freedom. The first three are the same as we would have with a one-way independent-samples ANOVA:


	df_T = N_T—1 = 54—1 = 53	Note that:_T df_T = df_bg+df_wg

	df_bg = k—1 = 3—1 = 2

and

	df_wg = N_T—k = 54—3 = 51

The only difference is that now we break df_wg into two further components, corresponding respectively to the two components of within-groups variability, SS_subj and SS_error. For df_subj the basic concept is once again "the number of items minus one." In this case the number of items is the number of subjects, which we will designate as N_subj. Thus



	df_subj = N_subj—1 = 18—1 = 17

Remove this component from df_wg, and what remains is



	df_error = df_wg—df_subj = 51—17 = 34

With these, we can then calculate the two relevant MS values as


MS_bg	=	SS_bg df_bg	=	120.0 2	= 60.0

and


MS_error	=	SS_error df_error	=	103.3 34	= 3.0

which in turn give us


F	=	MS_bg MS_error	=	60.0 3.0	= 20.0 with df=2,34

Figure 15.2 shows the sampling distribution of F for df=2,34, and the adjacent table shows the corresponding portion of Appendix D. As indicated, F=3.28 and F=5.29 mark the points in this distribution beyond which fall 5% and 1%, respectively, of all possible mere-chance outcomes, assuming the null hypothesis to be true.

Figure 15.2. Sampling Distribution of F for df=2,34


df denomi- nator	df numerator
	1	2	3
34	4.13 7.44	3.28 5.29	2.88 4.42

As the observed value of F=20.0 falls far to the right of F=5.29, the aggregate differences among the means of the three groups of measures, A|B|C, can be adjudged significant well beyond the .01 level. Our investigator can accordingly reject the null hypothesis, concluding with a high degree of confidence that the aggregate mean differences among the three groups of measures stem from something more than mere chance coincidence.

ANOVA Summary Table

Source	SS	df	MS	F	P
between groups ("effect")	120.0	2	60.0	20.0	<.01
within groups	2285.0	51
·error	103.3	34	3.0
·subjects	2181.7	17
TOTAL	2405.0	53

The assumptions and step-by-step computational procedures for the one-way correlated-samples ANOVA will be outlined in Part 3 of this chapter. It is also in Part 3 that we will see how the Tukey HSD test can be adapted to perform pairwise comparisons among the individual group means in a correlated-samples ANOVA.

End of Chapter 15, Part 2.
Return to Top of Chapter 15, Part 2
Go to Chapter 15, Part 3

Home

Click this link only if the present page does not appear in a frameset headed by the logo Concepts and Applications of Inferential Statistics