Ch5 Probability Pt3

Chapter 5. Basic Concepts of Probability
Part III

Conjunction and Disjunction as the Building Blocks of More Complex Probabilities

We will illustrate our points with an imaginary example taken from a medical context. There is a certain disease that has an abrupt and unmistakable onset, and for which there is currently no effective treatment. Of all the persons who come down with this disease, 40% spontaneously recover within two months and the remaining 60% do not recover within two months. As there is no way of predicting which patient will recover and which will not, the two-month outcome for any particular patient is regarded as a matter of mere chance. Thus, for any particular patient who comes down with the disease, the probability of spontaneously recovering within two months is

P_(recovery)=.4

and the probability of not recovering within two months is

P_{(non-recovery)}=.6

(Note that these two probabilities are a posteriori, in the sense that they are based on the rates of recovery and non-recovery in previously observed cases of the disease.)

Then comes a report from a field botanist working in the Brazilian rain forests of a certain plant that the indigenous peoples of the region have long used to treat the disease, apparently with some degree of success. A team of researchers reasons that perhaps the plant contains an ingredient that could be developed into an effective medical treatment for the disease. But as the isolation and refinement of that ingredient would surely prove quite costly, they first undertake to determine experimentally whether an extract from the plant, in its raw form, shows any effectiveness at all in treating the disease. In the next few paragraphs we will be describing their two-stage effort to put the raw plant extract to an experimental test—first, with a small sample of patients; and then, when the first experiment proves promising, with a much larger sample. We will stipulate that the researchers determine beforehand that the raw plant extract has no toxic effects. Also, please note that in both of these investigations the experimental hypothesis is directional. The researchers are expecting that the extract will produce a recovery rate significantly greater than the 40% rate attributable to mere chance spontaneous recovery.

In their preliminary test of this hypothesis, the researchers administer the plant extract to 10 randomly selected patients, beginning for each patient immediately after the onset of the disease and continuing for two months. As the experiment runs its course, the investigators observe that 7 of the 10 patients have recovered within their respective two-month periods, while 3 have not recovered.

Now clearly, the 70% recovery rate observed in this experiment is greater than the 40% baseline rate for spontaneous recovery. But the question is, is it significantly greater? This is the question of statistical significance, which you will recall is essentially a question of how likely or unlikely it is that the observed result could have been produced by nothing more than mere chance coincidence. Specifically for the present example, how likely or unlikely it is that as many as 7 of the 10 patients could have recovered by mere chance ("spontaneously"), if the plant extract was completely ineffective?

The first thing to notice about a question of this general type is that it has several layers. The first layer is indicated by the phrase "as many as," which reveals itself upon analysis to be simply another way of saying "this many or more." The logic of this point applies to scientific research in general. In most cases, the pertinent probability question is not, What is the probability of getting exactly this result? Rather it is, How likely is it that mere chance coincidence might have produced a result "as large as this," which is to say, "this large or larger?" (Sometimes the question is in the opposite direction: "... this small or smaller?") So the first layer of the question takes the form of a disjunction: How likely is it that mere chance coincidence could have produced as many as 7 recoveries out of 10, which is to say, either exactly 7 recoveries out of 10, or exactly 8 out of 10, or 9 out of 10, or 10 out of 10? Abbreviating "recoveries" as "R," the formal expression would be

	P_{(7 or more R out of 10)}
		= P_{(7R or 8R or 9R or 10R)}
		= P_(7R) + P_(8R) + P_(9R) + P_(10R)

So, essentially, it is just a task of adding up several component probabilities. The only complication is that, in order to add them, you first have to get them. We will introduce the logic of this type of task by examining a much simpler analogy involving coin tosses. If you toss 3 coins, A, B, and C, what is the mere-chance probability of getting as many as 2 heads, that is, either exactly 2 heads or exactly 3 heads? Here again (abbreviating "heads" as "H"), the first statement of the question takes the form of a disjunction:

	P_{(2 or more H in 3 tosses)}
		= P_(2H or 3H)
		= P_(2H) + P_(3H)

But once you start getting into its details, you find that even this relatively simple situation has a rather complex structure. The source of the complexity is that, while there is only one combination of heads (H) and tails (T) that will yield exactly 3 heads in 3 tosses:

	A	B	C
	H	H	H

there are several different ways of getting exactly two heads in 3 tosses, namely:

A	B	C
H	H	T
H	T	H
T	H	H

As illustrated in the following table, you can think of this complex situation as a network of probability pathways. The network has two main branches, labeled 2H and 3H, of which the first has three separate sub-branches, one for each of the three ways in which it is possible to get exactly 2 heads in 3 tosses. Each separate branch or sub-branch represents a conjunctive (multiplicative) probability, and where branches or sub-branches converge there is a disjunctive (additive) probability. Thus, there are 3 separate sub-pathways for reaching the outcome of exactly 2 heads, each with the conjunctive probability .5x.5x.5=.125. (Recall that P_(H)=.5 and P_(T)=.5.) The disjunctive probability of reaching this outcome one way or the other is therefore .125+.125+.125=.375. There is only one pathway for reaching the outcome of exactly 3 heads; it also has a conjunctive probability of .5x.5x.5=.125. Thus, the overall disjunctive probability of getting as many as 2 heads in 3 tosses (i.e., exactly 2 heads or exactly 3 heads) is .375+.125=.5.

Outcome		Probability
2H	HHT	.5x.5x.5=.125	.375
	HTH	.5x.5x.5=.125
	THH	.5x.5x.5=.125
3H	HHH	.5x.5x.5=.125	.125
Total =			.5

Let us now apply the same logic to the more complex example of our medical experiment. The task, recall, is to figure out the disjunctive probability of getting by mere chance, in the absence of any effective treatment, one or the other of the following potential outcomes:

	7 patients recover and 3 do not recover; or
	8 patients recover and 2 do not recover; or
	9 patients recover and 1 does not recover; or
	all 10 patients recover

We will begin with the most extreme outcome, because that is the simplest. Just as there is only one way of getting 3 heads in 3 coin tosses, there is only one way of getting 10 recoveries in 10 patients. Obviously that could occur only if patient 1 recovers, and patient 2 recovers, and patient 3 recovers, and so on through patient 10. (Recall that P_(recovery)=.4 and P_{(non-recovery)}=.6.) The probability for the extreme outcome of 10 recoveries out of 10 is therefore just the single conjunctive pathway

.1x.2x.3x.4x.Patient
1x.2x.3x.4x.5x.6x.7x.8x.9x10
.4x.4x.4x.4x.4x.4x.4x.4x.4x.4 = .000,105

As we move on to the less extreme outcomes, however, we begin to find the pathways branching into a multiplicity of sub-pathways. The extreme outcome of 10 recoveries out of 10 patients can be reached by only one route. The nearest less extreme outcome, 9 recoveries out of 10, on the other hand, can be reached by any one or another of 10 possible sub-pathways. That is, this outcome would be produced by any one or another of 10 different combinations of recoveries and non-recoveries. The first of these would be for the case where all patients except patient 1 recover; the second would be for the case where all except patient 2 recover; the third for the case where all except patient 3 recover; and so on, up through the case where all patients except patient 10 recover. As shown by the following calculations, this would constitute a total of 10 conjunctive sub-pathways for the potential outcome of 9 recoveries out of 10 patients, each with a probability equal to .4⁹x.6=.000,157.

.1x.2x.3x.4x.Patient
1x.2x.3x.4x.5x.6x.7x.8x.9x10
.6x.4x.4x.4x.4x.4x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.6x.4x.4x.4x.4x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.6x.4x.4x.4x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.6x.4x.4x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.6x.4x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.4x.6x.4x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.4x.4x.6x.4x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.4x.4x.4x.6x.4x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.4x.4x.4x.4x.6x.4 = .4⁹x.6 = .000,157
.4x.4x.4x.4x.4x.4x.4x.4x.4x.6 = .4⁹x.6 = .000,157

The disjunctive probability of reaching the outcome one way or the other is therefore the sum of these 10 sub-pathway probabilities, which (since the 10 separate probabilities are all the same) can be calculated multiplicatively as

10 x (.4⁹ x .6) = .00157

The principle illustrated by this calculation applies to all such cases where we are interested in the probability of getting "10 out of 10," "9 out of 10," "22 out of 30," and so forth. Abstractly, we can speak of it as the probability of getting "k out of N," where k represents the first number in such an expression and N represents the second. Two other abstract terms that we will need for this formulation are p, which is the probability that the event in question (e.g., the recovery of a patient) will occur in any particular instance; and q, which is the complementary probability that the event in question will not occur. For all situations of this general type, the number of ways in which it is possible to get the result of "k out of N" (10 out of 10, 9 out of 10, 8 out of 10, and so on) is given by the formula

number of ways_{(k out of N)} =

k!(N—k)!

and the probability of getting the result in any particular one of these ways is

p^k x q^N-k

Put it all together and you have

P_{(k out of N)} =

k!(N—k)!

p^k x q^N-k

(A brief refresher course on the factorial and exponential operations required to apply this formula is provided in SideTrip 5.1).

Listed below are the full calculations, using this formula, for the four pathway probabilities that are needed to answer our original question concerning the probability of getting as many as 7 recoveries out of 10 patients, by mere chance, in the absence of any effective treatment. The results of the intermediate exponential operations are rounded to six decimal places, and the final result of each calculation is rounded to four decimal places.

Probability that exactly 7 out of 10 patients will recover:

[N=10, k=7, p=.4, q=.6]

10! 7!x3!	x	.4⁷x.6³	=

3,628,800 5,040x6	x	.001,638x.216	=

120	x	.000,354	= .0425

Translation: There are 120 different ways of reaching the result of 7 recoveries out of 10 patients, and each of those ways has a mere-chance probability of .000,354. The probability of reaching the result one way or the other is therefore 120x.000,345=.0425.

Probability that exactly 8 out of 10 patients will recover:

[N=10, k=8, p=.4, q=.6]

10! 8!x2!	x	.4⁸x.6²	=

3,628,800 40,320x2	x	.000,655x.36	=

45	x	.000,236	= .0106

Translation: There are 45 different ways of reaching the result of 8 recoveries out of 10 patients, and each of those ways has a mere-chance probability of .000,236. The probability of reaching the result one way or the other is therefore 45x.000,236=.0106.

Probability that exactly 9 out of 10 patients will recover:

[N=10, k=9, p=.4, q=.6]

10! 9!x1!	x	.4⁹x.6¹	=

3,628,800 362,880x1	x	.000,262x.6	=

10	x	.000,157	= .0016

Translation: There are 10 different ways of reaching the result of 9 recoveries out of 10 patients, and each of those ways has a mere-chance probability of .000,157. The probability of reaching the result one way or the other is therefore 10x.000,157=.0016.

Probability that all 10 patients will recover:

[N=10, k=10, p=.4, q=.6]

10! 10!x0!	x	.4¹⁰x.6⁰	=

3,628,800 3,628,800	x	.000,105x1	=

1	x	.000,105	= .0001

Translation: There is only one way of reaching the result of 10 recoveries out of 10 patients, and that way has a mere-chance probability of .0001.

The full answer to the question is then simply the sum of these separate pathway probabilities:

P_{(7 or more R out of 10)}

= P_(7R) + P_(8R) + P_(9R) + P_(10R)

= .0425 + .0106 + .0016 + .0001 = .0548

Translation: There is a 5.48% likelihood that as many as 7 recoveries out of 10 patients would occur by mere chance coincidence, in the absence of any effective treatment.

You will recall from Chapter 4 that the standard criterion for statistical significance is the 5% level. Any observed result that has a mere-chance likelihood equal to or less than 5% (P<.05) is regarded as significant, and any observed result that has a mere-chance likelihood greater than 5% (P>.05) is regarded as non-significant. By this criterion the present result (P=.0548) falls short of significance, though only by a very slight distance. Most investigators would be inclined to describe such a narrow miss as marginally significant, which is a compact way of saying "not quite significant, but close enough to warrant further investigation."

Following this custom, our medical researchers regard their results as promising and proceed to put the plant extract to a much fuller test. Making use of the contacts they have with a number of clinical facilities throughout the land, they arrange to have the extract administered to a total of 1,000 patients, again beginning for each patient immediately after the onset of the disease and continuing for two months. As the new experiment runs its course, the investigators observe that 430 of the 1,000 patients have recovered within their respective two-month periods, while 570 have not recovered. This 43% recovery rate is of course much lower than the non-significant 70% rate observed in the first experiment—and, indeed, only slightly above the 40% baseline rate of mere chance spontaneous recovery. But keep in mind that we are now dealing with a much larger sample. It is the same point made earlier in our introductory mind-over-matter example, concerning the difference between a 10-toss test and a 100-toss test. Clearly, it would be much easier for mere chance to produce a 70% recovery rate in a sample of 10 patients than in a sample of 1,000 patients. What you are now being asked to consider is the possibility that for a sample of 1,000 patients even a recovery rate of 43% would be very unlikely to result from mere chance.

From the fact that I am bothering to use this "430 out of 1,000" scenario as an example, you can safely assume that the result will prove significant, once we figure out the details. Except for the difference in the numbers, the question of statistical significance here is exactly the same as before: given that any particular patient has a 40% chance of spontaneous recovery within two months, and a 60% chance of non-recovery, how likely or unlikely it is that as many as 430 out of any particular set of 1000 patients could recover by mere chance, if the plant extract has no effect whatsoever? Essentially, it is the disjunctive probability of getting either 430 recoveries out of 1000, or 431 out of 1000, or 432 out of 1000, and so on, up through 1000 out of 1000.

The logic for this question is exactly the same as for the one we have just examined. The only difference is in the complexity of the details. For the probability of 7 or more recoveries out of 10 you only need to perform four main pathway calculations, whereas for 430 or more out of 1000 you need to perform a total of 571—and most of these would involve factorial and exponential operations of rather staggering proportions. Take a few minutes to try to work out just the first of these calculations

P_{(430 R out of 1000)} =

1000!

430!x570!

x .4⁴³⁰ x .6⁵⁷⁰

and you will find yourself hoping there might be an easier way. Fortunately, there is.

End of Chapter 5.
Return to Top of Chapter 5, Part 3
Go to Chapter 5 Appendix [Exact Binomial Probability Calculator]
Go to Chapter 6 [Introduction to Probability Sampling Distributions]

Home

Click this link only if the present page does not appear in a frameset headed by the logo Concepts and Applications of Inferential Statistics