From Chapter 9, Part 2:
_{T}
tratio for comparing the mean of a sample with the mean (either actual or hypothetical) of a population:



 t =
 M_{X}—_{M} est.:_{M}



tratio for comparing the means of a two samples:



 t =
 M_{Xa}—M_{Xb} est.:_{MM}



William S. Gosset's notable achievement was the mathematical discovery that, even though
tratios are based on estimates, they nonetheless behave in a precise, orderly fashion. Specifically, what he discovered was that when
tratios are considered en masse they form relativefrequency distributions whose outlines are capable of being precisely known; and further, that this precise knowledge of the outlines can then be used as a basis for making precise probability judgments, in much the same way that precise knowledge of the properties of the normal distribution permits us to make precise probability assessments in connection with
zratios.
To illustrate this point, let us return for a moment to the reference source population introduced in Part 1 of this chapter.
Figure 9.1 [repeated]. Reference Source Population, Normally Distributed with _{}=18 and =±3
Suppose we were to draw a vast number of pairs of random samples from this population, each sample of size N=10. Since this is a source population for which we know the true values of
i^{2} and
i, we could easily calculate a proper
zratio for each pair. If we were then to take all these
zratios and lay them out in the form of a frequency distribution, we would find that it very closely approximates the shape of the unit normal distribution. That, in fact, is precisely why it is meaningful to refer a
zratio to the table of the unit normal distribution.
But now let us go back to the very same pairs of samples and calculate
tratios for them. That is, for each pair of samples we separately calculate the ratio
 t =
 M_{Xa}—M_{Xb} est.:_{MM}

If we were then to take all these
tratios and lay
them out in the form of a frequency distribution, they too would form an orderly, symmetrical outline. It would not be a normal distribution—
not in this particular case, anyway—but orderly all the same, and equally useful for making precise probability judgments. Figure 9.4 shows what these two distributions of z and t would look like. The most readily visible difference between them is that the tdistribution is somewhat flatter and more spread out. The implication is that values of t that lie toward the center of the distribution are somewhat less probable than comparable values of z, while those that lie toward the extremes are somewhat more probable than comparable values of z.
Figure 9.4. The Unit Normal Distribution Compared with the tDistribution for df=18
Another difference between the distributions of z and t is that, while there is one and only one unit normal distribution, there is a whole range of different tdistributions. Indeed, there is a separate tdistribution for each possible value of degrees of freedom. The concept of degrees of freedom in the present context is much the same as the one introduced in Chapter 8 in connection with chisquare procedures. Essentially, it is an index of the amount of random variability that can be present in the particular situation. If you have a single sample of size N, the number of degrees of freedom is
df = N—1
If you have two samples of sizes N_{a} and N_{b}, it is
df = (N_{a}—1)+(N_{b}—1)
In the present case we have two samples of sizes N_{a}=10 and N_{b}=10, so the number of degrees of freedom is (10—1)+(10—1)=18.
The general pattern of resemblances and differences among the various tdistributions is illustrated in Figure 9.5, which shows the distributions of t for df=5 and df=40.
Figure 9.5. The Distributions of t for df=5 and df=40
Notice that the tdistribution for df=5 is even flatter and more spread out than the one shown in Figure 9.4 for df=18, whereas the one for df=40 is markedly less flat and spread out. Indeed, to the naked eye the tdistribution for df=40 is scarcely distinguishable from the normal distribution. It is also a very close fit mathematically. In general, the smaller the number of degrees of freedom, the greater the difference between the corresponding tdistribution and the unit normal distribution. Conversely, the larger the number of degrees of freedom, the more closely the corresponding tdistribution will approximate the unit normal distribution.
The logic of probability assessment is the same with tdistributions as with the unit normal distribution. The probability of getting a result "this large or larger" is equal to the proportion of the distribution that lies to the right (+) or left (—) of the calculated value of t, depending of the direction of the hypothesis the researcher is seeking to test. For a nondirectional twoway hypothesis it is the total proportion that lies both to the right of +t and to the left of —t. The full table of critical values of t can be found in Appendix C. The abbreviated version shown below will give you an idea of how it is laid out.

 Level of Significance

df
5 10 18 20
 .05 
2.02 1.81 1.73 1.72
 .025 .05
2.57 2.23 2.10 2.09
 .01 .02
3.36 2.76 2.55 2.53
 .005 .01
4.03 3.17 2.88 2.85
 .0005 .001
6.87 4.59 3.92 3.85
 directional test nondirectional test


To the right of each value of df are the values of +t or —t that must be met or exceeded in order for a result to be significant at or beyond a particular level of significance. To illustrate, note that the first two values lying to the right of df=18 are 1.73 and 2.10. These and the other values in that row (2.55, 2.88, 3.92) refer to specific locations within the sampling distribution of t for the case where degrees of freedom is equal to 18. As shown graphically in Figure 9.6, t=+1.73 marks the point in this distribution beyond which (to the right, away from the mean) falls 5% of the total area of the distribution. An obtained tratio equal to or greater than +1.73 is accordingly significant at or beyond the .05 level for a directional ("onetailed") test. (The distinction between directional and nondirectional tests of significance is introduced in Chapter 7.)
Figure 9.6. Sampling Distribution of t for df=18
Similarly, t=+2.10 marks the point beyond which falls 2.5% of the distribution, so an obtained tratio equal to or greater than +2.10 is significant at or beyond the .025 level for a directional test. As the distribution is symmetrical, the same logic and procedure apply to negative tvalues, which lie on the left side of the distribution.
 For a nondirectional ("twotailed") test, the probability associated with an obtained value of t is twice that associated with the same value of t for a directional test. Thus, for a nondirectional test with df=18, an obtained value of t would have to fall at or beyond +2.10 or —2.10 in order to be significant at or beyond the .05 level. In practical terms, this means that the absolute (unsigned) value of t must be equal to or greater than 2.10.

I began this portion of the chapter by asking you to suppose you we were drawing a vast number of pairs of random samples from our reference source population, calculating the tratio for each pair. The following demonstration gives you the opportunity to do exactly that, and in the process to decide for yourself whether all this talk about the sampling distributions of t has any basis in reality. Each time you click the button, your computer will draw ten pairs of random samples. For each pair, N_{a}=10, N_{b}=10, hence df=18. Whenever the tratio produced by a pair falls at or beyond the 5% level at either end of the df=18 sampling distribution (t<—1.73 on the left or t>+1.73 on the right), it will be marked as "***". Click the button thirty or forty times and you will find the cumulative proportion of these cases coming very close to the theoretical proportion of 5%+5%=10%. The average batch of ten sample pairs will therefore include one tratio marked as "***", although any particular batch might of course have more than one or none at all.
In the next three chapters we will examine the eminently practical applications to which all this abstract theorizing can be put. Meanwhile, note that the present chapter includes an Appendix that will generate a graphic and numerical display of the properties of the sampling distribution of t for any value of df between 4 and 200, inclusive. As the page opens, you will be prompted to enter the value of df.