As a reminder, the assumptions of the oneway ANOVA for independent samples are
 that the scale on which the dependent variable is measured has the properties of an equal interval scale;_{T}
 that the k samples are independently and randomly drawn from the source population(s);_{T}
 that the source population(s) can be reasonably supposed to have a normal distribution; and_{T}
 that the k samples have approximately equal variances.
We noted in the main body of Chapter 14 that we need not worry very much about the first, third, and fourth of these assumptions when the samples are all the same size. For in that case the analysis of variance is quite robust, by which we mean relatively unperturbed by the violation of its assumptions. But of course, the other side of the coin is that when the samples are not all the same size, we do need to worry. In this case, should one or more of assumptions 1, 3, and 4 fail to be met, an appropriate
nonparametric alternative to the oneway independentsamples ANOVA can be found in the
KruskalWallis Test.
I will illustrate the KruskalWallis test with an example based on ratingscale data, since this is by far the most common situation in which unequal sample sizes would call for the use of a nonparametric alternative. In this particular case the number of groups
is k=3. I think it will be fairly obvious how the logic and procedure would be extended in cases where
k is greater than 3.
To assess the effects of expectation on the perception of aesthetic quality, an investigator randomly sorts 24 amateur wine aficionados into three groups, A, B, and C, of 8 subjects each. Each subject is scheduled for an individual interview. Unfortunately, one of the subjects of group B and two of group C fail to show up for their interviews, so the investigator must make do with samples of unequal size: n_{a}=8, n_{b}=7, and n_{c}=6, for a total of N=21. The subjects who do show up for their interviews are each asked to rate the overall quality of each of three wines on a 10point scale, with "1" standing at the bottom of the scale and "10" at the top.
 Group

 A
 B
 C


6.4
6.8
7.2
8.3
8.4
9.1
9.4
9.7

2.5
3.7
4.9
5.4
5.9
8.1
8.2

1.3
4.1
4.9
5.2
5.5
8.2

mean
 8.2
 5.5
 4.9

As it happens, the three wines are the same for all subjects. The only difference is in the texture of the interview, which is designed to induce a relatively high expectation of quality in the members of group A; a relatively low expectation in the members of group C; and a merely neutral state, tending in neither the one direction nor the other, for the members of group B. At the end of the study, each subject's ratings are averaged across all three wines, and this average is then taken as the raw measure for that particular subject. The adjacent table shows these measures for each subject in each of the three groups.
¶Mechanics
The preliminaries of the KruskalWallis test are much the same as those of the MannWhitney test described in Subchapter 11a. We begin by assembling the measures from all
k samples into a single set of size N. These assembled measures are rankordered from lowest (rank#1) to highest (rank#N), with tied ranks included where appropriate; and the resulting ranks are then returned to the sample, A, B, or C, to which they belong and substituted for the raw measures that gave rise to them. Thus, the raw measures that appear in the following table on the left are replaced by their respective ranks, as shown in the table on the right.
 Raw Measures

 Ranked Measures

 A
 B
 C
 A
 B
 C


6.4
6.8
7.2
8.3
8.4
9.1
9.4
9.7

2.5
3.7
4.9
5.4
5.9
8.1
8.2

1.3
4.1
4.9
5.2
5.5
8.2

11
12
13
17
18
19
20
21

2
3
5.5
8
10
14
15.5

1
4
5.5
7
9
15.5
 A, B, C Combined

 sum of ranks
 131
 58
 42
 231

average of ranks
 16.4
 8.3
 7.0
 11

With the KruskalWallis test, however, we take account not only of the sums of the ranks within each group, but also of the averages. Thus the following items of symbolic notation:


 T_{A} =
 the sum of the n_{a} ranks in group A

 M_{A} =
 the mean of the n_{a} ranks in group A



 T_{B} =
 the sum of the n_{b} ranks in group B

 M_{B} =
 the mean of the n_{b} ranks in group B



 T_{C} =
 the sum of the n_{c} ranks in group C

 M_{C} =
 the mean of the n_{c} ranks in group C



 T_{all} =
 the sum of the N ranks in all groups combined

 M_{all} =
 the mean of the N ranks in all groups combined



¶Logic and Procedure
·The Measure of Aggregate Group Differences
You will sometimes find the KruskalWallis test described as an "analysis of variance by ranks." Although it is not really an analysis of variance at all, it does bear a certain resemblance to ANOVA up to a point. In both procedures, the first part of the task is to find a measure of the aggregate degree to which the group means differ. With ANOVA that measure is found in the quantity known as
SS_{bg}, which is the betweengroups sum of squared deviates. The same is true with the KruskalWallis test, except that here the group means are based on ranks rather than on the raw measures. As a reminder that we are now dealing with ranks, we will symbolize this new version of the betweengroups sum of squared deviates as
SS_{bg(R)}. The following table summarizes the mean ranks for the present example. Also included are the sums and the counts (n
_{a}, n
_{b}, n
_{c}, and N) on which these means are based.
 A
 B
 C
 All

counts
 8
 7
 6
 21

sums
 131
 58
 42
 231

means
 16.4
 8.3
 7.0
 11.0

In Chapters 13 and 14 you saw that the squared deviate for any particular group mean is equal to the squared difference between that group mean and the mean of the overall array of data, multiplied by the number of observations on which the group mean is based. Thus, for each of our current three groups
 A:
 8(16.4—11.0)^{2} = 233.3



 B:
 7(8.3—11.0)^{2} = 051.0

 C:
 6(7.0—11.0)^{2} = 096.0


 SS_{bg(R)} = 380.3

On analogy with the formulaic structures for SS_{bg} developed in Chapters 13 and 14, we can write the conceptual formula for SS_{bg(R)} as
 SS_{bg(R)} =([n_{g}(M_{g}—M_{all})^{2}]


Here as well, the subscript "g"
means "any particular group."

and the computational formula as
 SS_{bg(R)}
 =

 (T_{g})^{2} n_{g}
 —
 (T_{all})^{2} N_{a}

With
k=3 samples, this latter structure would be equivalent to
 SS_{bg(R)}
 =
 (T_{A})^{2} n_{a}
 +
 (T_{B})^{2} n_{b}
 +
 (T_{C})^{2} n_{c}
 —
 (T_{all})^{2} N_{a}

For
k=4 it would be
 SS_{bg(R)}
 =
 (T_{A})^{2} n_{a}
 +
 (T_{B})^{2} n_{b}
 +
 (T_{C})^{2} n_{c}
 +
 (T_{D})^{2} n_{d}
 —
 (T_{all})^{2} N_{a}

And so forth for other values of
k.
Here, in any event, is how it would work out for the present example. The discrepancy between what we get now and what we got a moment ago (380.3) is due to rounding error in the earlier calculation. As usual, it is the computational formula that is the less susceptible to rounding error, hence the more reliable.
 SS_{bg(R)}
 =
 (131)^{2} 8
 +
 (58)^{2} 7
 +
 (42)^{2} 6
 —
 (231)^{2} 21


 = 378.7

·The NullHypothesis Value of SS_{bg(R)}
The null hypothesis in this or any comparable situation involving several independent samples of ranked data is that the mean ranks of the
k groups will not substantially differ. On this account, you might suppose that the nullhypothesis value of
SS_{bg(R)}, the aggregate measure of group differences, would be simply zero. A moment's reflection, however, will show why this cannot be so.
Consider the very simple case where there are 3 groups, each containing 2 observations. By way of analogy, imagine you had six small cards representing the ranks "1," "2," "3," "4," "5," and "6." If you were to sort these cards into every possible combination of two ranks per group, you would find the total number of possible combinations to be

 N! n_{a}! n_{b}! n_{c}!
 =
 6! 2! 2! 2!
 = 90


And the values of
SS_{bg(R)} produced by these 90 combinations would constitute the sampling distribution of
SS_{bg(R)} for this particular case. Of these 90 possible combinations, a few (6) would yield values of
SS_{bg(R)} equal to exactly zero. All the rest would produce values greater than zero. (It is mathematically impossible to have a sum of squared deviates less than zero.) Accordingly, the mean of this sampling distribution—the value that observed instances of
SS_{bg(R)} will tend to approximate if the null hypothesis is true—is not zero, but something greater than zero.
In any particular case of this sort, the mean of the sampling distribution of SS_{bg(R)} is given by the formula
which for the simple case just examined works out as
For our main example, we therefore know that the observed value of
SS_{bg(R)}=378.7 belongs to a sampling distribution whose mean is equal to

 (3—1) x
 21(21+1) 12
 = 77.0


All that now remains is to figure out how to turn this fact into a rigorous assessment of probability.
·The KruskalWallis Statistic: H
In case you have been girding yourself for some heavy slogging of the sort encountered with the MannWhitney test, you can now relax, for the rest of the journey is quite an easy one. The KruskalWallis procedure concludes by defining a ratio symbolized by the letter H, whose numerator is the observed value of SS_{bg(R)} and whose denominator includes a portion of the above formula for the mean of the sampling distribution of SS_{bg(R)}. Note that most textbooks give a very differentlooking formula for the calculation of H—a rather impenetrable structure to which we will return in a moment. This first version affords a much clearer sense of the underlying concepts.
And now for the denouement. When each of the
k samples includes at least 5 observations (that is, when n
_{a}, n
_{b}, n
_{c}, etc., are all equal to or greater than 5), the sampling distribution of
H is a very close approximation of the chisquare distribution for
df=k—1. It is actually a fairly close approximation even when one or more of the samples includes as few as 3 observations.
For our present example, we can therefore calculate the value of H as

 H
 =
 SS_{bg(R)} N(N+1)/12
 =
 378.7 21(21+1)/12
 = 9.84


And then, treating this result as though it were a value of chisquare, we can refer it to the sampling distribution of chisquare with
df=3—1=2. The following graph, borrowed from Chapter 8, will remind you of the outlines of this particular chisquare distribution. In brief: by the KruskalWallis test, the observed aggregate difference among the three samples is significant a bit beyond the .01 level.
 Theoretical Sampling Distribution of ChiSquare for df=2

·An Alternative Formula for the Calculation of H
I noted a moment ago that textbook accounts of the KruskalWallis test usually give a different version of the formula for H. If you are a beginning student calculating H by hand, I would recommend using the version given above, as it gives you a clearer idea of just what H is measuring. Once you get the hang of things, however, you might find this alternative computational formula a bit more convenient.

 H
 =
 12 N(N+1)
 (

 (T_{g})^{2} n_{g}
 )
 —
 3(N+1)


In any event, as you can see below, this version yields exactly the same result as the other.

 H
 =
 12 21(21+1)
 (
 (131)^{2} 8
 +
 (58)^{2} 7
 +
 (42)^{2} 6
 )
 —
 3(21+1)



 =
 9.84

The
VassarStats web site has a page that will perform all steps of the KruskalWallis test, including the rankordering of the raw measures.