BayesFactor: Software for Bayesian inference: statistics

Showing posts with label statistics. Show all posts

Friday, July 29, 2016

Stop saying confidence intervals are "better" than p values

One of the common tropes one hears from advocates of confidence intervals is that they are superior, or should be preferred, to p values. In our paper "The Fallacy of Placing Confidence in Confidence Intervals", we outlined a number of interpretation problems in confidence interval theory. We did this from a mostly Bayesian perspective, but in the second section was an example that showed why, from a frequentist perspective, confidence intervals can fail. However, many people missed this because they assumed that the paper was all Bayesian advocacy. The purpose of this blog post is to expand on the frequentist example that many people missed; one doesn't have to be a Bayesian to see that confidence intervals can be less interpretable than the p values they are supposed to replace. Andrew Gelman briefly made this point previously, but I want to expand on it so that people (hopefully) more clearly understand the point.

Numerical pitfalls in computing variance

One of the most common tasks in statistical computing is computation of sample variance. This would seem to be straightforward; there are a number of algebraically equivalent ways of representing the sum of squares \(S\), such as \[ S = \sum_{k=1}^n ( x_k - \bar{x})^2 \] or \[ S = \sum_{k=1}^n x_k^2 + \frac{1}{n}\bar{x}^2 \] and the sample variance is simply \(S/(n-1)\).

What is straightforward algebraically, however, is sometimes not so straightforward in the floating-point arithmetic used by computers. Computers cannot represent numbers to infinite precision, and arithmetic operations can affect the precision of floating-point numbers in unexpected ways.

How to train undergraduate psychologists to be post hoc BS generators

Teaching undergraduate psychology is difficult for a variety of reasons. Students come in with preconceived notions about what psychological research is and are sometimes disappointed with the mismatch between their preconceptions and reality. Much of what psychologists do is highly specialized and requires skills that are difficult to teach, and psychologists-in-training can't offer much research-wise until they have years of experience. The assignments we ask undergraduates to complete are meant to train their critical thinking skills to prepare them for a more substantive contribution to research. Sometimes, however, they do exactly the opposite; instead, assignments can reward post hoc BS generation rather than actual critical thinking.

How to check Likert scale summaries for plausibility

Suppose you are reading a paper that uses Likert scale responses. The paper reports the mean, standard deviation, and number of responses. If we are -- for some reason -- suspicious of a paper, we might ask, "Are these summary statistics possible for this number of responses, for this Likert scale?" Someone asked me this recently, so I wrote some simple code to help check. In this blog post, I outline how the code works.

Asymmetric funnel plots without publication bias

In my last post about standardized effect sizes, I showed how averaging across trials before computing standardized effect sizes such as partial \(\eta^2\) and Cohen's d can produce arbitrary estimates of those quantities. This has drastic implications for meta-analysis, but also for the interpretations of these effect sizes. In this post, I use the same facts to show how one can obtain asymmetric funnel plots — commonly taken to indicate publication bias — without any publication bias at all. You should read the previous post if you haven't already.

Averaging can produce misleading standardized effect sizes

Recently, there have been many calls for a focus on effect sizes in psychological research. In this post, I discuss how naively using standardized effect sizes with averaged data can be misleading. This is particularly problematic for meta-analysis, where differences in number of trials across studies could lead to very misleading results.

Confidence intervals: What they are and are not

Over at the Psychonomic Society Featured Content blog, there are several new articles outlining some of our work on confidence intervals published previously in Psychonomic Bulletin & Review. In a three-part series, Steve Lewandosky and Alexander Etz lay out our case for why confidence intervals are not what people think they are. I've written enough about confidence intervals lately, so I'll just link you to their articles.

Reviewers and open science: why PRO?

As of yesterday, our paper outlining the PRO Initiative for open science was accepted for publication in the journal Royal Society Open Science. It marks the end of many tweaks to the basic idea, and hopefully the beginning of a new era in peer reviewing: the empowered peer reviewer. The basic idea behind the PRO Initiative is that the peer relationship is fundamental in science, and it is this relationship that should drive cultural change. Open science is necessary, possible, and overdue. As reviewers, we can make it happen.

Habits and open data: Helping students develop a theory of scientific mind

This post is related to my open science talk with Candice Morey at Psychonomics 2015 in Chicago; also read Candice's new post on the pragmatics: "A visit from the Ghost of Research Past". In this post, we suggest three ideas that can be implemented in a lab setting to improve scientific practices, and encourage habits that make openness easier. These ideas are designed to be minimally effortful for the adviser, but to have a big impact on practice:

* Data partners: young scientists have a partner in another lab, with whom they swap data. The goal is to see if their data documentation is good enough that their partner can reproduce their main analysis with minimal interaction.
* Five year plan: When a project is part-way through, students must give a brief report that details what they have done to insure that the data and analyses will be comprehensible to members of the lab in five-year's time, after they have left.
* Submission check: At first submission of an article based on the project, advisors should discuss with their advisees the pros and cons of opening their data, and how the data will be promoted online, if it will be open.

Neyman does science, part 2

In part one of this series, we discussed the different philosophical viewpoints of Neyman and Fisher on the purposes of statistics. Neyman had a behavioral, decision based view: the purpose of statistical inference is to select one of several possible decisions, enumerated before the data have been collected. To Fisher, and to Bayesians, the purpose of statistical inference is related to the quantification of evidence and rational belief. I agree with Fisher on this issue, and I was curious how Neyman -- with his pre-data inferential philosophy -- would actually tackle a problem with real data. In this second part of the series, we examine Neyman's team's analysis of the data from the Whitetop weather modification experiment in the 1960s.

Neyman does science, part 1

On reading Neyman's statistical and scientific philosophy (e.g., Neyman, 1957), one of the things that strikes a scientist is its extreme rejection of post-data reasoning. Neyman adopts the view that once data is obtained statistical inference is not about reasoning, but is rather about the automatic adoption of one of several decisions. Given the importance of post-data reasoning to scientists -- which can be confirmed by reading any scientific manuscript -- I wondered how Neyman would think and write about an actual, applied problem. This series of blog posts explores Neyman's work on the analysis of weather modification experiments. The (perhaps unsurprising) take-home message from this series of posts is this: not even Neyman applied Neyman's philosophy, when he was confronted with real data.

The fallacy of placing confidence in confidence intervals (version 2)

I, with my coathors, have submitted a new draft of our paper "The fallacy of placing confidence in confidence intervals". This paper is substantially modified from its previous incarnation. Here is the main argument:

"[C]onfidence intervals may not be used as suggested by modern proponents because this usage is not justified by confidence interval theory. If used in the way CI proponents suggest, some CIs will provide severely misleading inferences for the given data; other CIs will not. Because such considerations are outside of CI theory, developers of CIs do not test them, and it is therefore often not known whether a given CI yields a reasonable inference or not. For this reason, we believe that appeal to CI theory is redundant in the best cases, when inferences can be justified outside CI theory, and unwise in the worst cases, when they cannot."

The document, source code, and all supplementary material is available here on github.

Friday, April 17, 2015

Guidelines for reporting confidence intervals

I'm working on a manuscript on confidence intervals, and I thought I'd share a draft section on the reporting of confidence intervals. The paper has several demonstrations of how CIs may, or may not, offer quality inferences, and how they can differ markedly from credible intervals, even ones with so-called "non-informative" priors.

All about that "bias, bias, bias" (it's no trouble)

At some point, everyone who fiddles around with Bayes factors with point nulls notices something that, at first blush, seems strange: small effect sizes seem “biased” toward the null hypothesis. In null hypothesis significance testing, power simply increases when you change the true effect size. With Bayes factors, there is a non-monotonicity where increasing the sample size will slightly increase the degree to which a small effect size favors the null, then the small effect size becomes evidence for the alternative. I recall puzzling with this with Jeff Rouder years ago when drafting our 2009 paper on Bayesian t tests.

My favorite Neyman passage: on confidence intervals

I've been doing a lot of reading on confidence interval theory. Some of the reading is more interesting than others. There is one passage from Neyman's (1952) book "Lectures and Conferences on Mathematical Statistics and Probability" (available here) that stands above the rest in terms of clarity, style, and humor. I had not read this before the last draft of our confidence interval paper, but for those of you who have read it, you'll recognize that this is the style I was going for. Maybe you have to be Jerzy Neyman to get away with it.

Neyman gets bonus points for the footnote suggesting the "eminent", "elderly" boss is so obtuse (a reference to Fisher?) and that the young frequentists should be "remind[ed] of the glory" of being burned at the stake. This is just absolutely fantastic writing. I hope you enjoy it as much as I did.

The TES Challenge to Greg Francis

This post is a follow-up to my previous post, “Statistical alchemy and the 'test for excess significance'”. In the comments on that post, Greg Francis objected to my points about the Test for Excess Significance. I laid out a challenge in which I would use simulation to demonstrate these points. Greg Francis agreed to the details; this post is about the results of the simulations (with links to the code, etc.)

Two things to stop saying about null hypotheses

There is a currently fashionable way of describing Bayes factors that resonates with experimental psychologists. I hear it often, particularly as a way to describe a particular use of Bayes factors. For example, one might say, “I needed to prove the null, so I used a Bayes factor,” or “Bayes factors are great because with them, you can prove the null.” I understand the motivation behind this sort of language but please: stop saying one can “prove the null” with Bayes factors.

I also often hear other people say “but the null is never true.” I'd like to explain why we should avoid saying both of these things.

Statistical alchemy and the "test for excess significance"

[This post is based largely on my 2013 article for Journal of Mathematical Psychology; see the other articles in that special issue as well for more critiques.]

When I tell people that my primary area of research is statistical methods, one of the reactions I often encounter from people untrained in statistics is that “you can prove anything with statistics.” Of course, this rankles, first because it isn't true (unless you use a very strange definition of prove) and second because I've spent years learning the limitations of statistics, and there are many limitations. These limitations exist, however, in the context of enormous successes. In the sciences, the field of statistics rightly has a place of honor.

This success is evidenced by the great number of scientific arguments that are supported by statistical methods. Not all statistical arguments are created equal, of course. But the respect with which statistics is viewed has the unfortunate downside that a statistical argument can apparently turn a leaden hunch into a golden “truth”. This post is about such statistical alchemy.

The frequentist case against the significance test, part 2

The significance test is perhaps the most used statistical procedure in the world, though has never been without its detractors. This is the second of two posts exploring Neyman's frequentist arguments against the significance test; if you have not read Part 1, you should do so before continuing (“The frequentist case against the significance test, part 1”).

The frequentist case against the significance test, part 1

It is unfortunate that today, we tend to think about statistical theory in terms of Bayesianism vs frequentism. Modern practice is a blend of Fisher's and Neyman's ideas, with the characteristics of the blend driven by convenience rather than principle. Significance tests are lumped in as a “frequentist” technique by Bayesians in an unfortunate rhetorical shorthand.

In recent years, the significance test has been critiqued on several grounds, but often these critiques are offered from Bayesian or pragmatic grounds. In a two-part post, I will outline the frequentist case developed by Jerzy Neyman against the null hypothesis significance test.

More about BayesFactor

Friday, July 29, 2016

Stop saying confidence intervals are "better" than p values

Tuesday, May 3, 2016

Numerical pitfalls in computing variance

Sunday, April 3, 2016

How to train undergraduate psychologists to be post hoc BS generators

Wednesday, March 30, 2016

How to check Likert scale summaries for plausibility

Saturday, January 9, 2016

Asymmetric funnel plots without publication bias

Thursday, January 7, 2016

Averaging can produce misleading standardized effect sizes

Thursday, December 3, 2015

Confidence intervals: What they are and are not

Wednesday, December 2, 2015

Reviewers and open science: why PRO?

Thursday, November 19, 2015

Habits and open data: Helping students develop a theory of scientific mind

Thursday, November 12, 2015

Neyman does science, part 2

Tuesday, November 10, 2015

Neyman does science, part 1

Monday, April 20, 2015

The fallacy of placing confidence in confidence intervals (version 2)

Friday, April 17, 2015

Guidelines for reporting confidence intervals

Friday, April 10, 2015

All about that "bias, bias, bias" (it's no trouble)

Thursday, April 9, 2015

My favorite Neyman passage: on confidence intervals

Sunday, March 29, 2015

The TES Challenge to Greg Francis

Saturday, March 28, 2015

Two things to stop saying about null hypotheses

Monday, March 23, 2015

Statistical alchemy and the "test for excess significance"

Monday, March 9, 2015

The frequentist case against the significance test, part 2

The frequentist case against the significance test, part 1