All rights reserved.
When a human subject rates something on a 5-point scale, we may be reasonably confident that the scale has ordinal properties, such that "5" represents something greater than "4"; "4" represents something greater than "3"; and so on. But we can have no confidence at all that the points on the scale are separated by equal intervals:
T
1
2
3
4
5T
For all we know, the scale could look like this:
T
1
2
3
4
5T
or this:
T
1
2
3
4
5T
or any one of a multitude of other non-equal-interval possibilities. We can also have no confidence that the scale intervals underlying a subject's rating on one item are the same as might underlie his or her rating on another item. By the same token, we can have no confidence that any multiplicity of subjects rating the same item are doing so on the basis of the same scale intervals.
But what does it matter if the scale of measurement is merely ordinal? A "5" is still greater than a "4," and surely it makes a kind of sense to say that the mean rating of a group of subjects on a certain item is 2.6, 3.4, or whatever it might be. Indeed, it does make a kind of sense; but please keep clearly in mind that the sense it makes is not nearly so strong as when you say, for example, that the mean weight of a sample of seeds of a certain plant species is 1.3 grams; or that the mean number of "Yes" responses to a 20-item questionnaire is 14.2; and so on for any other example where the scale of measurement clearly is equal-interval. It all goes back to the point I made such a fuss over in Chapter 1: basic mathematical operations involving addition, subtraction, multiplication, and division assume that the numbers being fed into them derive from an equal-interval scale of measurement, and they can sometimes get indigestion when that assumption is not met.
Still, it does make a kind of sense to speak of mean ratings, and in that degree it potentially makes sense to speak of a significant difference between or among the mean ratings of two or more groups of subjects. The question is: If you plug merely ordinal rating-scale data into an analysis of variance and end up with "significant" effects, are those effects really significant in the technical statistical meaning of that term. The following table will remind you of what that technical meaning is:T
Conclusion
| Technical Meaning
"significant at the .05 level"
| If the null hypothesis were true, the observed effect would have had only a 5% chance of occurring through mere random variability.
"significant beyond the .05 level"
| If the null hypothesis were true, the observed effect would have had less than a 5% chance of occurring through mere random variability.
"significant at the .01 level"
| If the null hypothesis were true, the observed effect would have had only a 1% chance of occurring through mere random variability.
"significant beyond the .01 level"
| If the null hypothesis were true, the observed effect would have had less than a 1% chance of occurring through mere random variability.
and so on for any other level of significance.
| | | | | |
With respect to the use of ordinal-scale data in the analysis of variance, one way of approaching the question of robustness is this:
If you were to plug such data into the analysis, and
if the null hypothesis were clearly
true (such that there are really no effects at all within the population[s] of measures from which the data are drawn), would there still be only a 5% chance of ending up with an effect "significant" at the .05 level?; would there still be only a 1% chance of ending up with an effect "significant" at the .01 level?; and so on.
There are some who prefer to speak of computer-generated random numbers as "pseudo-random" on the ground that, although their sequence is an unpredictable patternless jumble, they are nonetheless rigidly determined by the mathematical
algorithms that produce them. For our own purposes it is sufficient to note that
so long as an aggregation of events has all the earmarks of randomness, it makes no practical difference whether we call it "random" or "pseudo-random."
The following exercise simulates this situation through a somewhat elaborate shaping of random numbers. The procedure begins by drawing 5 random integers falling within the range of zero to 1000, with each possible integer having an equal chance of being drawn. For example:
T
650
806
104
792
446
T
These are then rank-ordered from lowest to highest, and each is associated with the corresponding value on a 5-point scale, as indicated in blue:
T
1 104
2 446
3 650
4 792
5 806
T
Then another random integer between zero and 1000 is drawn, and it is assigned the ordinal-scale value (
1,
2,
3,
4, or
5) of the one that it is closest to in the previous set of 5 random integers. For example: if the sixth random integer were 43, it would be closest to 104 in the above list, hence assigned the ordinal-scale value of
1; if it were 634, it would be closest to 650, hence assigned the ordinal-scale value of
3; and so on.
This final ordinal-scale assignment is then treated as though it were the rating (1, 2, 3, 4, or 5) of one particular subject in one particular group. On analogy with the structure of Example 3, we perform this random drawing 72 times, so as to end up with the ratings of 72 "subjects," 12 in each of six groups arranged in a matrix of two rows by three columns.
In the first of the two tables following this paragraph there is one button labeled "1 Sample" and another labeled "10 Samples." Clicking the first will generate one sample of 72 random "ratings"; clicking the second will generate 10 samples. For each sample the F-ratios for the row, column, and interaction effects will be displayed, along with an indication ("Yes!") if the F-ratio is "significant" at or beyond the basic .05 level. You will also see an indication of the cumulative percentages of F-ratios that turn out significant at or beyond the .05 level. As you click one or another of the buttons, keep in mind that the null hypothesis in this situation is true. Any mean differences that appear among the groups therefore result from nothing more than random variability. If the analysis of variance is robust in its treatment of these merely ordinal-scale data, the numbers of "significant" F-ratios for rows, columns, and interaction should each come out at just about 5% over the long haul. To give it a really fair test you will need to accumulate about 10,000 samples, which I recognize might exceed your time or inclination. At one point in developing the programming for this exercise, I set it to crank out 10,000 samples while I tended to something else. Lo and behold! In that particular long haul, the numbers of "significant" F-ratios actually did come out at just about 5% each. (If nothing seems to be happening when you click "10 Samples," be patient; sooner or later it will.)
|
significant?
|
cumulative
percentage
significant |
¶Step-by-Step Computational Procedure: Two-Way Analysis of Variance for Independent Samples
I will show the procedures for the case of 2 rows and 3 columns,
hence rc=6. The modifications required for different values of r and c will be fairly obvious. The steps listed below assume that you have already done the basic number-crunching to get
∑Xi and
∑X2i for each of the groups separately and for all groups combined.
Step 1. Combining all rc groups together, calculate
Step 2. For each of the rc groups separately, calculate the sum of squared deviates within the group ("g") as
Step 3. Take the sum of the
SSg values across all rc groups to get
| SSwg = SSa+SSb+SSc+SSd+SSe+SSf
|
Step 4. Calculate
SSbg as
Step 5. Calculate
SSrows as
| SSrows
| =
| (∑Xr1)2 Nr1
| +
| (∑Xr2)2 Nr2
|
| (∑XT)2 NT
|
Step 6. Calculate
SScols as
| SScols
| =
| (∑Xc1)2 Nc1
| +
| (∑Xc2)2 Nc2
| +
| (∑Xc3)2 Nc3
|
| (∑XT)2 NT
|
Step 7. Calculate
SSrxc as
| SSrxc
| = SSbg SSrows SScols
|
Step 8. Calculate the relevant degrees of freedom as
| dfT = NT1
dfwg = NTrc
dfbg = rc1
dfrows = r1
dfcols = c1
dfrxc = (r1)(c1)
| | | | | |
Step 9. Calculate the relevant mean-square values as
|
|
|
|
|
|
|
|
| MSrows
| =
| SSrows dfrows
|
| MScols
| =
| SScols dfcols
|
| MSrxc
| =
| SSrxc dfrxc
|
| MSerror
| =
| SSwg dfwg
|
|
|
|
|
|
|
| |
Step 10. Calculate
F as
|
|
|
|
|
|
| Frows
| =
| MSrows MSerror
|
| Fcols
| =
| MScols MSerror
|
| Frxc
| =
| MSrxc MSerror
|
|
|
|
|
| |
Step 11. Refer the calculated values of
F to the table of critical values of
F (
Appendix D), with the appropriate pair of numerator/denominator degrees of freedom.