Tuesday, November 10, 2015

Neyman does science, part 1

On reading Neyman's statistical and scientific philosophy (e.g., Neyman, 1957), one of the things that strikes a scientist is its extreme rejection of post-data reasoning. Neyman adopts the view that once data is obtained statistical inference is not about reasoning, but is rather about the automatic adoption of one of several decisions. Given the importance of post-data reasoning to scientists -- which can be confirmed by reading any scientific manuscript -- I wondered how Neyman would think and write about an actual, applied problem. This series of blog posts explores Neyman's work on the analysis of weather modification experiments. The (perhaps unsurprising) take-home message from this series of posts is this: not even Neyman applied Neyman's philosophy, when he was confronted with real data.

Consider the view of statistical inference put forward by Fisher in contrast to Neyman's perspective:
Decision [as opposed to reasoning] itself must properly be referred to a set of motives, the strength or weakness of which should have had no influence whatever on any estimate of probability. We aim, in fact, at methods of inference which should be equally convincing to all rational minds, irrespective of any intentions they may have in utilizing the knowledge inferred. (Fisher, 1955, p. 77)
The concept of evidence -- that is, information which warrants changes in belief -- is central to science. Under an evidential view, evidence can be strong or weak, or, in other words, convincing or unconvincing. This occurs as a matter of degrees, independent of any particular decisions one might have in mind. Evidence is a post-data concept, applying to the interpretation of data after it has been collected.

Neyman, on the other hand, appears to reject epistemology altogether. Post-data ideas like beliefs -- justified or otherwise -- are not a target of statistical analysis:
The beliefs of particular scientists are a very personal matter and it is useless to attempt to norm them by any dogmatic formula...The content of the concept of inductive behavior is the recognition that the purpose of every piece of serious research is to provide grounds for the selection of one of several contemplated courses of action. (Neyman, 1957, p. 16)
Neyman offers a pre-data philosophy. Decision criteria are set before the experiment, on the basis of considerations of long-run Type I and Type II errors. The outcome of a study is the selection of a decision, not a reasonable change in belief. For those of us in the sciences, Neyman's is a very strange outlook on science and statistics. Fisher, of course, was primarily a scientist; Neyman, a statistician. This difference shows in their respective ideas of how statistical inference is to be undertaken.

Neyman, however, goes to great lengths to show that Fisher, in fact, acted consistently with Neyman's own philosophy, and not with Fisher's stated one. In highlighting a case where Fisher has interpreted a low p value as indicating that a particular null hypothesis is not true, Neyman says:
The trouble is that the premise "P is less than .01" does not imply that "the departures are not fortuitous" [that is, did not arise by chance, under the null hypothesis]. In fact, even if the inheritance of the characteristics considered conformed exactly with the assumed model, the probability of observing $\chi^2$ corresponding to the value of P less than 0.01 is positive and approximately equal to 0.01. Thus, the assertion "the departures are not fortuitous" cannot be deduced from "P is less than .01". Yet, this assertion is made, and is made in very definite terms...[O]ne may presume that the assertion "the departures are not fortuitous" is interpreted by Fisher as equivalent to the adoption of the hypothesis of differential viability. (Neyman, 1957, p. 12)
And on this point, Neyman is right. A low p value does not imply that the null hypothesis is false (at least, not by itself). Neyman takes this to mean that Fisher was making a decision to reject the null hypothesis, rather than engaging in any sort of post-data "reasoning".

The main problem is that it is difficult to see how Neyman's philosophy is applicable to science, which, as Fisher pointed out, is primarily designed to incrementally increase knowledge, and concerned with graded evidence and beliefs. I wanted to understand how Neyman would interpret the results of an already-performed experiment. How could he avoid post-data evaluations of evidence?

Neyman's weather modification work


In a series of articles in starting in the 1960s, Neyman was involved with the analysis of meteorological data from the Whitetop project, which was designed to evaluate the efficacy of cloud seeding to increase rainfall. In cloud seeding, particles (such as silver iodide) are scattered into clouds in the hope that the water or ice will condense around them. In theory, this should increase precipitation, because it is meant to mimic the natural processes underlying precipitation.

The Whitetop project was one of the first large-scale, randomized experiments in cloud seeding. It was designed to test whether silver iodide dropped from a plane in summer months could increase the probability of rain, or, given that rain occurred, how much rain actually fell. Silver iodide was dropped on random days within an area about 60 miles in radius around West Plains, Missouri in five summer seasons. This area at the time was suffering from low rainfall that threatened agriculture in the area.

There are a few relevant facts that we can lay out before we start:
  • The field of weather modification at the time (and perhaps still) was susceptible to widespread confirmation bias and what we would call today questionable research practices, including reliance on p hacking to explain away negative results in a primary outcome (Atals, 1977).
  • In light of the previous point, it is perhaps not surprising that the efficacy of cloud seeding is still disputed. According to the National Research Council's Committee on the Status of and Future Directions in U.S. Weather Modification Research and Operations (2003): "The Committee concludes that there still is no convincing scientific proof of the efficacy of intentional weather modification efforts. In some instances there are strong indications of induced changes, but this evidence has not been subjected to tests of significance and reproducibility."
  • There was (is?) no known mechanism by which seeding could affect rainfall upwind at distances on the order of 100 miles (Braham, 1979). 
Neyman's team of researchers at the University of California, Berkeley was not part of the original team of researchers on the Whitetop project team. In a series of papers they analysed the Whitetop data with an interest in determining the long-distance, medium-term (1 day) effects of cloud seeding. In the next post, I will examine how Neyman presents data analyses in several papers, with a focus on how he and his team writes about data analysis and statistical inference. The resulting analysis will be similar to how Neyman (1957) assessed Fisher's language, but in reverse: we will see that Neyman strongly favored evidential language and did not use decisions, error rates, or power to interpret the data.

Go to part 2 >>>


Bibliography for these posts

Atals, D. (1977). The Paradox of Hail Suppression. Science, 195(4274), 139–145.

Braham, R. R. (1979). Field Experimentation in Weather Modification. Journal of the American Statistical Association, 74(365), 57–68.

Committee on the Status and Future Directions in U.S Weather Modification Research and Operations, National Research Council. (2003). Critical issues in weather modification research. National Academies Press. 

Fisher, R. A. (1955). Statistical Methods and Scientific Induction. Journal of the Royal Statistical Society. Series B (Methodological), 17, 69–78.

J. L. Lovasich, M. A. W., J. Neyman, E. L. Scott. (1971). Hypothetical Explanations of the Negative Apparent Effects of Cloud Seeding in the Whitetop Experiment. Proceedings of the National Academy of Sciences of the United States of America, 68(11), 2643–2646.

Lovasich, J. L., Neyman, J., Scott, E. L., & Smith, J. A. (1969). Wind directions aloft and effects of seeding on precipitation in the Whitetop experiment. Proceedings of the National Academy of Sciences, 64(3), 810–817.

Lovasich, J. L., Neyman, J., Scott, E. L., & Wells, M. A. (1971). Further Studies of the Whitetop Cloud-Seeding Experiment. Proceedings of the National Academy of Sciences, 68(1), 147–151.

Neyman, J. (1957). “Inductive Behavior” as a Basic Concept of Philosophy of Science. Review of the International Statistical Institute, 25, 7–22.

Neyman, J. (1977). A statistician’s view of weather modification technology (A Review). Proceedings of the National Academy of Sciences of the United States of America, 74(11), 4714–4721.

Neyman, J., Scott, E. L., & Smith, J. A. (1969). Whitetop Experiment (response to Battan). Science, 165(3893), 618.

Neyman, J., Scott, E. L., & Wells, M. A. (1969). Statistics in Meteorology. Review of the International Statistical Institute, 37(2), 119–148.

Neyman, J., Scott, E., & Smith, J. A. (1969). Areal Spread of the Effect of Cloud Seeding at the Whitetop Experiment. Science, 163(3874), 1445–1449.

No comments:

Post a Comment