Firearms Tactical Institute
Reprinted with express written consent from the International Wound Ballistics Association (IWBA) and the author. Publication of this document on our web site does not constitute an endorsement of Firearms Tactical Institute by IWBA or the author.
Web Site Index and Navigation Center
Reprinted from Wound Ballistics Review; 4(2), 1999: 16-21.
The Marshall & Sanow "Data" - Statistical Analysis Tells the Ugly Story
By Duncan MacPherson
Evan Marshall has been a bad joke to almost every technically trained person ever since his earliest articles on his "data base" were published.1 Ed Sanow has been part of this act since at least 1992 when Marshall & Sanow's first jointly authored book was published.2 Statistical analyses of this "data base" was the source of the certainty that this Marshall & Sanow "data base" was nonsense. Unfortunately, recognition and understanding of this kind of statistical analysis is not easy for those without technical training, so it has been easy for Marshall & Sanow and their advocates to just ignore this criticism because their target audience doesn't understand it and ignores it. This was and continues to be frustrating to those who understand all aspects of this whole situation, but can't think of anything to do about it.
We can hope that this frustration will be mostly a memory with the publication of Maarten van Maanen's article (pages 9-13).7 While there are always those who reject any truth, however obvious, most persons will now recognize that the statistical criticism must have been valid even if they don't completely follow the details themselves. With the case closed, there may be some renewed interest in how statisticians actually knew what was going on based just on their analyses. This article briefly recapitulates the statistical flaws in this "data base" uncovered over the years, tries one more time to explain this process to a lay audience, and explains how statistical analysis can be applied to the results that van Maanen has tabulated to show that even the data in van Maanen's analysis that is not obviously false is so statistically unlikely that it would be foolish to believe it valid. It is also important to note that while the earlier statistical analyses of the Marshall & Sanow "data base" focused on relatively small areas of the total "data," this analysis discredits all of the Marshall & Sanow "data base."
Marshall's infamous 1988 article1 really was the start of his troubles. This article contained data that Dr. Martin Fackler collected in a table which is described in some detail elsewhere in this issue (pages 14-15).8 In summary, this table showed a regularity of performance so preposterous that every statistically knowledgeable person who saw it gagged. Dr. Carroll Peters calculated the probability of this regularity occurring by chance as 1 in 100 billion billion; this was reported in Reference 3 along with an explanation of the problem, but unfortunately this analysis was never otherwise published. This table in Reference 3 also provoked me into including several pages in Reference 4 devoted to an explanation of the problems with combat data, a short explanation of statistics, a discussion of fraud in manufacturing data, and an example loosely (but clearly) based on this table. The subsequent "data base" tabulation2 and the latest version5 did not show the earlier regularity. However, the unbelievably clumsy way apparently used to change this "data base" only came to light from an examination of van Maanen's analysis (see pages 9-13).7
Marshall & Sanow's 1996 book5 managed to make essentially the same mistake all over again in a completely different area of the "data base." This time the preposterous regularity was in the "data" showing the relative effectiveness of one torso shot vs. two torso shots. This is discussed in more detail in Reference 6, but the bottom line is that the probability that this data could be created by chance (i.e., that it is real) is about 1 in 80 million. This is preposterous, but it is less preposterous. If one thought this was a trend, they could conclude that Marshall & Sanow will actually be publishing credible data by the year 2030 or thereabouts. However, this prediction would be unjustified because the latest results show they aren't getting any smarter after all.
Statistical analysis can get complicated, but understanding it well enough to see why Marshall & Sanow's "data base" is incredibly unlikely is not that difficult. Fortunately, this is especially true for the latest analysis based on the results that van Maanen has tabulated. However, a lot of people don't want to go through the effort to understand the equations, so the equations used to make the actual probability calculations are given and explained with an example in Appendix 1. Appendix 1 can be used as the reader wishes; look at it now, look at it after reading the rest of the article, or don't look at it at all. Readers who don't want to get into the mathematics themselves should feel free to take Reference 1 and/or the computations in the rest of this article to any competent mathematics professor and ask "Is this correct?".
Analyzing Increasing "One Shot Stop" Percentages
The increase in Marshall & Sanow's "one shot stop" percentage has been noted for some time,6 and is well quantified by van Maanen's Table 2 (see pages 11-12).7 While the possibility of an increase in ammunition effectiveness is potentially a factor in this trend, this is actually a minor element at best for several reasons. Most of the calibers and loads in the table have undergone little change. The high velocity, low weight bullet loads advocated by Marshall & Sanow (and which heavily populate their report) are not subject to much improvement because most of the users of this ammunition do not want the required changes (or they wouldn't be using this ammunition to begin with). Many of the calibers in the table are not on the short list of calibers used by knowledgeable handgunners in shootings, and as a result have not been upgraded with state-of-the-art bullet developments by manufacturers. Some popular manufacturers' top of the line bullet designs (Blount's Gold Dot, Remington's Golden Saber, Winchester's Ranger) that do represent advances are mostly or completely missing (rather curiously given their popularity with law enforcement). A factor not always remembered in the search for better ammunition is that the actual difference in wound trauma creation between the best state-of-the-art design JHP bullets and relatively ordinary JHP bullets is very small in almost all shootings; law enforcement needs the best to deal with the extreme cases, not typical shootings. Many of the calibers and loads listed do not have JHP bullets, so any change is negligible. In addition to all of this, we will be comparing Marshall & Sanow reports only 4 years apart, limiting the time available for any changes to be incorporated. The bottom line is that we will assume that any change in ammunition effectiveness between Marshall & Sanow reports is so small that ignoring it does not significantly affect the conclusions. In line with good engineering practice, this assumption will be checked after we have calculated the results.
When taking two measurements of any parameter, the second measurement may be higher or lower than the first. The probabilities of these two outcomes are the same; namely a probability of 0.5, which means that if this kind of measurement is made many times, the second measurement ideally increases 50% of the time and decreases 50% of the time, and almost always comes close to the ideal result. It is true that in any particular case, the probability of a higher first measurement is increased if we know the first measurement is higher than average, but this is balanced by a case where the probability of a higher first measurement is reduced because we know the first measurement was lower than average. If we take all the cases without rejecting any, this effect cancels out.
The statistics of this two measurement situation correspond exactly to coin flips; this is just like considering an increase heads and a decrease tails. Maarten van Maanen has assembled Marshall & Sanow's "data records" into Table 2 of his article (pages 11-12)7. This "data" can be examined easily to see how many caliber and load combinations increased and how many decreased between the 1988 "data" and the 1992 "data". This process can then be repeated to make the same comparison of the 1992 "data" and 1996 "data". We would expect valid data for all comparisons to show about the same number of increases and decreases.
The tabulations of increases and decreases will be summarized here for the convenience of the reader, but everyone can examine van Maanen's Table 2 and verify the count for themselves. For that matter, the reader can also compare Table 2 with the referenced Marshall & Sanow documents it was taken from; there are no secrets in this analysis. On the other hand, the reader cannot compare Marshall & Sanow's "data" with any basic source because the authors claim this is "secret data" (even though autopsies and death reports are public records available to all legitimate applicants).
Examination of the assembled information in van Maanen's Table 2 is very interesting. Comparing the "one shot stop" percentage for the calibers and loads that appear in both the 1988 and 1992 "data sets" shows 59 increases, 16 decreases, and 3 that are the same to the nearest percent. A similar comparison for the 1992 and 1996 "data sets" shows 52 increases, 12 decreases, and 16 that are the same to the nearest percent. In actuality, the calibers and loads that appear the same to the nearest percent are not really identical. The percentages must be calculated more accurately to resolve whether these cases are actually small increases or small decreases. When this is done, the 3 1988 to 1992 cases identical to the nearest percent are actually all increases, while the 16 1992 to 1996 cases are 8 increases and 8 decreases. This resolution must be incorporated to have a proper statistical calculation, so the actual numbers are 62 increases and 16 decreases from 1988 to 1992 and 60 increases and 20 decreases from 1992 to 1996. The need to correctly assign these cases that are close to equal may seem strange, but there is a useful way to think about it. If a coin being flipped just barely turned over to become a head, you could say it was close to being a tail; however, you shouldn't (and probably wouldn't even think of) making a separate category of coin toss of "so close it could have been either head or tail". You look at it carefully, determine it is a head, and that's the end of it.
This ratio of increases to decreases is a big departure from the expected equal increases and decreases. It is easy to calculate the probability that this unequal division will appear by chance; we just use the formula given in Appendix 1. The number of cases is much larger, so a computer is a practical necessity. Table 1 shows a computer printout of the calculation.
|Table 1. Probability
of Marshall & Sanow "Data" Increases
Table 1 is easy to understand once you know the computer printout nomenclature for small numbers. The figure 1.23E-05 means 1.23 (10)-5; this exponent notation sets the decimal point location. Since (10)-5 = 0.00001, this is a shorthand way of defining the decimal point shift 5 places to the left; a plus sign in the exponent means a shift to the right, so (10)+5 = 100000. This may seem like a lot of trouble when the exponent is -5, but it is very convenient and helps avoid mistakes when the exponent is -20 (as some of the table numbers are).
The "case probability" column is the probability of exactly that number of increases occurring; for example, in the 1988 to 1992 data, the probability of exactly 62 increases is about 5.69 (10)-8. The "probability sum to 78" column is the total probability of all the number of increases from the number chosen up to an including 78; for example, in the 1988 to 1992 data, the total probability of all the increases from and including 62 to 78 is about 7.56 (10)-8. This probability is and has to be larger than the probability of exactly 62 increases because it included that case as well as others; it isn't much larger because the probability gets much smaller when even one more increase is added. This total probability is the probability that the number of increases is at least 62, and this is what needs to be used to properly show how unlikely an outcome is. It is not immediately obvious and not easy to explain simply why this "case probability sum" is the proper value to use rather than the "case probability", so some added discussion of this is given in Appendix 2 for those who are interested. Those who don't want to get into this statistical detail can avoid it with a clear conscience because these two parameters have magnitudes so similar for the cases we are interested in that the difference has no practical significance.
The "case probability sum" 62 increases example used above is the actual result for the comparison of the 1988 to 1992 Marshall & Sanow "data sets". From Table 1, the equivalent probability for the 1992 to 1996 Marshall & Sanow "data sets" comparison is about 4.29 (10)-6. The probability that both of these results would be obtained by chance is calculated by multiplying the two individual probabilities, which gives a combined probability of about 3.25 (10)-13; this is about 1 in 3 trillion or 1 in 3 million million. This is a big step backwards from the 1 in 80 million probability that the two shot "data" results in Reference 5 were obtained by chance, and is why there isn't any real trend to improvement of the Marshall & Sanow "data", and why Marshall & Sanow probably won't be publishing credible data by the year 2030 after all.
The reader should notice that these ridiculous statistical results are all obtained from the data in van Maanen's analysis that is not obviously false with the exception of two cartridge and load combinations in the now exposed 1992 .38 Special data (see pages 11-12).8 Note also that this analysis discredits the entire Marshall & Sanow "data base", not just part of it. The real meaning of this entire re-examination of the Marshall & Sanow "data base" is that some of this "data" appears bogus if looked at in the right way with ordinary common sense, and all of this "data" appears bogus if carefully examined in the right way with technical sophistication.
Validating the Analysis Assumption
As a final point, some readers might have a reasonable concern that a few of the caliber and load combinations show increases because these bullets really were made more effective. We will now check the assumption that this effect is so small that ignoring it does not significantly affect the conclusions. To do this, we make the assumption that 10 of these caliber and load combinations in each of the calculations should be ignored because they do represent such improvement (although there is no evidence of this). The numbers then become 52 increases and 16 decreases from 1988 to 1992 and 50 increases and 20 decreases from 1992 to 1996. The probabilities were rerun with these inputs, and the combined probability that these results are due to chance drop to about 1.55 (10)-9, or about 1 in 650 million. This is an improvement, but nothing to brag about, especially because it is a result of an assumption that isn't justified. It is not worth the space to give the equivalent of Table 1 for this case, but the results can be verified by anyone who wants to take the trouble. This validates the fact that ignoring ammunition improvements does not significantly affect the conclusion drawn from the analysis.
The analysis given herein is completely unconnected to previous statistical analyses of the Marshall & Sanow "data base" because it approaches a completely different aspect of this "data base" and deals with the totality of this "data base". However, the conclusion is the same: the probability that the Marshall & Sanow "data base" is an assembly of valid information from uncorrupted sources is so low that this "data base" cannot reasonably be believed.
The shameless popular gun press will probably continue to provide a forum for the Marshall & Sanow nonsense in the interests of selling magazines. One can reasonably hope that the target audience for this material will henceforth be only the kind of eccentrics that become involved with the all too common irrational cults so prevalent in our society.
Marshall, EP: "One-shot Stopping Power." Petersen's Handguns 2(6), November 1988; 24-29, 68-71.
Marshal EP, Sanow EJ: Handgun Stopping Power: The Definitive Study, Paladin Press, Boulder, Colorado, 1992.
Fackler, ML: "Marshall - Sanow Can't Beat the Long Odds." Soldier of Fortune, January 1994; 64-65.
MacPherson, D: Bullet Penetration - Modeling the Dynamics and the Incapacitation Resulting from Wound Trauma, Ballistics Publications, Box 772, El Segundo, CA, 1994; 18-23.
Marshall, EP, Sanow, EJ: Street Stoppers: The Latest Handgun Stopping Power Street Results, Paladin Press, Boulder, Colorado, 1996.
Fackler, ML: "Book Review: Street Stoppers: The Latest Handgun Stopping Power Street Results: Marshall EP, Sanow, EJ." Wound Ballistics Review 3(1), 1997; 26-31.
van Maanen, M: "Discrepancies in the Marshall & Sanow 'Data Base': An Evaluation Over Time." Wound Ballistics Review 4(2), Fall 1999; 9-13.
Fackler, ML: "Undeniable Evidence." Wound Ballistics Review 4(2), Fall 1999; 14-15.
Appendix 1. Explanation of Probability Calculation
The following equations use technical notation to indicate multiplication and division rather than the x and ÷ symbols taught in grade school. 2 times 3 is written as (2)(3), not 2x3, and 2 divided by 3 is written 2/3, not 2÷3.
The symbol ! stands for factorial in mathematical nomenclature; M! is the result of multiplying M and all smaller integers together. Examples make this apparently complex nomenclature as it really is:
3! = (3)(2)(1) = 6
5! = (5)(4)(3)(2)(1) = 120
1! = 1
There is one convention in the factorial concept that is not intuitive: this is 0! = 1, and not zero as might be supposed. For those who have some interest, this is closely related to the need to have X0 = 1, which is also not intuitive to algebra students when they first bump into it. Four our purposes we will accept the fact that this is just the way it has to be defined in order for the results to come out correctly, and drop the subject.
The simplest statistical case is the consideration of the kind of events that have two equally likely outcomes; the classic example is a coin toss, where the probability of each of the two outcomes (heads and tails) is 0.5 (i.e., the most likely outcome of any series of tosses is 50% heads and 50% tails). The formula for the probability of any specific outcome for any chosen number of events (i.e., coin tosses) is given in Reference 4, but is repeated here for reader convenience. The probability PN of getting exactly N heads in M coin tosses is:
PN = (M!)/((M-N)!)(N!)2M
We will go through the simple example of four coin tosses to demonstrate the formula. This sets M = 4 and the probabilities of 0, 1, 2, 3, and 4 heads are:
P0 = (4!)/((4-0)!)(0!)24 = (24)/(24)(1)(16) = 1/16
P1 = (4!)/((4-1)!)(1!)24 = (24)/(6)(1)(16) = 1/4
P2 = (4!)/((4-2)!)(2!)24 = (24)/(2)(2)(16) = 3/8
P3 = (4!)/((4-3)!)(3!)24 = (24)/(1)(6)(16) = 1/4
P4 = (4!)/((4-4)!)(4!)24 = (24)/(1)(24)(16) = 1/6
Notice that the sum of all these probabilities is 1; which confirms the fact that we have included all the possible outcomes. Also notice that the most probable outcome is 2 heads (i.e., half of the tosses) as expected, with the outcomes being less probable as they differ further from this result.
That's all there really is to it. When there are more more events (coin tosses) the only calculation effect is that the factorial numbers get very large; this is a lot of trouble with pencil and paper, but not with a computer or even a calculator. The calculated probabilities for extreme results (e.g., zero heads or even 25% heads) become much smaller as the number of cases increase; this isn't surprising because everybody knows that getting 80 heads in a row is far less likely than 4 heads in a row even if they have no idea how to calculate the exact probabilities.
Appendix 2. Discussion of Probability Details
The magnitude of the probability of any particular result ("case probability") depends very strongly on the number of trials (or cases) being considered, but is always largest for the result closest to the expected 50% result. The four coin toss case in Appendix 1 illustrates this, and shows a probability magnitude of 0.375 (i.e., 3/8 in decimal) for 2 heads (50%). The 50% (i.e., 40 increases) probability magnitude for the 80 case 1992 to 1996 data of Table 1 is about 0.089 (although the table has not been extended to show this). When the number of cases is large, N! can be approximated well with Stirling's formula, and for these conditions Stirling's formula can be combined with the probability equation in Appendix 1 to give the following formula as an approximation to the 50% level "case probability" (Pc):
Pc = square root of 2/pN
For N = 80 the case probability Pc computed from this formula is about 0.089 (i.e., correct to better than 2 significant figures), and for N = 106 (1 million) the case probability Pc computed from this formula is about 0.0008. This formula is not really accurate for small values on N, but for the N = 4 example in Appendix 1, gives Pc = 0.399, which is only about 6.4% larger than the correct value of 0.375. These numbers for the value of Pc make sense intuitively because in a huge number of trials any specific outcome is unlikely, but this clearly shows that this "case probability" is not the parameter we are looking for to represent meaningfully the probability of a distribution of interest.
"Case Probability Sum"
We have used the term "case probability sum" (the "probability sum to M" column in Table 1) to mean the probability of all the number of increases from the number chosen up to and including M. Those not familiar with statistics will probably not have any idea what this is or why it is used. The "case probability sum" is actually the area under the normal (i.e., Gaussian) distribution curve that is at or outside the selected value of N. This is true because each value of PN is actually a quantized piece of the total distribution curve. Adding all the values of PN gives a value of 1.0 (as was shown in the text for the sample case of 4 coin flips). The area under any part of the distribution curve is the probability that the test results will be in that area. We are interested here in a segment of the distribution curve that is very far from the mean, so the area (probability of occurrence) is very small and changing very rapidly with any change in N.
We have evaluated the probability that the Marshall & Sanow "data base" could show the increase that it has in "one shot stop" by chance alone. We have not addressed the probability of data decrease, but that is a potential factor because if the data had shown as extreme a decrease, we would take an equally dim view of it. Not surprisingly, the probability of 62 decreases in a sample of 78 is exactly equal to the probability of 62 increases if the chance is only one factor. This means that one could argue that both a plus and minus deviation from the mean should be allowed in computing probability of occurrence. If this is assumed, each of the individual probabilities would be higher by a factor of 2, and the compound probability would increase by a factor of 4. This doesn't mean much when the probability is in the range of 10-13 to 10-12.
Delivering you informative multimedia essays about the "battlefield problem-solving" tactical aspects of armed self-defense.
Web Site Index and Navigation Center
© 2000 Firearms Tactical Institute. All Rights Reserved.
FirearmsTactical™, Salus In Periculo, and logo are trademarks of Firearms Tactical Institute.