Thursday, December 3, 2015

Confidence intervals: What they are and are not

Over at the Psychonomic Society Featured Content blog, there are several new articles outlining some of our work on confidence intervals published previously in Psychonomic Bulletin & Review. In a three-part series, Steve Lewandosky and Alexander Etz lay out our case for why confidence intervals are not what people think they are. I've written enough about confidence intervals lately, so I'll just link you to their articles.

Wednesday, December 2, 2015

Reviewers and open science: why PRO?

As of yesterday, our paper outlining the PRO Initiative for open science was accepted for publication in the journal Royal Society Open Science. It marks the end of many tweaks to the basic idea, and hopefully the beginning of a new era in peer reviewing: the empowered peer reviewer. The basic idea behind the PRO Initiative is that the peer relationship is fundamental in science, and it is this relationship that should drive cultural change. Open science is necessary, possible, and overdue. As reviewers, we can make it happen.

Thursday, November 19, 2015

Habits and open data: Helping students develop a theory of scientific mind

This post is related to my open science talk with Candice Morey at Psychonomics 2015 in Chicago; also read Candice's new post on the pragmatics: "A visit from the Ghost of Research Past". In this post, we suggest three ideas that can be implemented in a lab setting to improve scientific practices, and encourage habits that make openness easier. These ideas are designed to be minimally effortful for the adviser, but to have a big impact on practice:

* Data partners: young scientists have a partner in another lab, with whom they swap data. The goal is to see if their data documentation is good enough that their partner can reproduce their main analysis with minimal interaction.
* Five year plan: When a project is part-way through, students must give a brief report that details what they have done to insure that the data and analyses will be comprehensible to members of the lab in five-year's time, after they have left.
* Submission check: At first submission of an article based on the project, advisors should discuss with their advisees the pros and cons of opening their data, and how the data will be promoted online, if it will be open.

Thursday, November 12, 2015

Neyman does science, part 2

In part one of this series, we discussed the different philosophical viewpoints of Neyman and Fisher on the purposes of statistics. Neyman had a behavioral, decision based view: the purpose of statistical inference is to select one of several possible decisions, enumerated before the data have been collected. To Fisher, and to Bayesians, the purpose of statistical inference is related to the quantification of evidence and rational belief. I agree with Fisher on this issue, and I was curious how Neyman -- with his pre-data inferential philosophy -- would actually tackle a problem with real data. In this second part of the series, we examine Neyman's team's analysis of the data from the Whitetop weather modification experiment in the 1960s.

Tuesday, November 10, 2015

Neyman does science, part 1

On reading Neyman's statistical and scientific philosophy (e.g., Neyman, 1957), one of the things that strikes a scientist is its extreme rejection of post-data reasoning. Neyman adopts the view that once data is obtained statistical inference is not about reasoning, but is rather about the automatic adoption of one of several decisions. Given the importance of post-data reasoning to scientists -- which can be confirmed by reading any scientific manuscript -- I wondered how Neyman would think and write about an actual, applied problem. This series of blog posts explores Neyman's work on the analysis of weather modification experiments. The (perhaps unsurprising) take-home message from this series of posts is this: not even Neyman applied Neyman's philosophy, when he was confronted with real data.

Thursday, September 24, 2015

BayesFactor version 0.9.12-2 released to CRAN

I've released BayesFactor 0.9.12-2 to CRAN; it should be available on all platforms now. The changes include:

  • Added feature allowing fine-tuning of priors on a per-effect basis: see new argument rscaleEffects of lmBF, anovaBF, and generalTestBF
  • Fixed bug that disallowed logical indexing of probability objects
  • Fixed minor typos in documentation
  • Fixed bug causing regression Bayes factors to fail for very small R^2
  • Fixed bug disallowing expansion of dot (.) in generalTestBF model specifications
  • Fixed bug preventing cancelling of all analyses with interrupt
  • Restricted contingency prior to values >=1
  • All BFmodel objects have additional "analysis" slot giving details of analysis

Wednesday, September 9, 2015

Please help: BayesFactor testimonials

I'm compiling a portfolio about the BayesFactor software, and I would love to have short comments (a few sentences to a paragraph) from people who have found the software useful. If you have used the software and you wouldn't mind sending me a short blurb about your experience, I'd love to hear from you! Please send your BayesFactor testimonial to Thanks in advance!

Monday, August 10, 2015

On radical manuscript openness

One of my papers that has attracted a lot of attention lately is "The Fallacy of Placing Confidence in Confidence Intervals," in which we describe some of the fallacies held by the proponents and users of confidence intervals. This paper has been discussed on twitterreddit, on blogs (eg, here and here), and via email with people who found the paper in various places.  A person unknown to me has used the article as the basis for edits to the Wikipedia article on confidence intervals. I have been told that several papers currently under review cite it. Perhaps this is a small sign that traditional publishers should be worried: this paper has not been "officially" published yet.

Tuesday, May 26, 2015

Call for papers: Bayesian statistics, at Zeitschrift für Psychologie

I am guest editing a special topical issue of Zeitschrift für Psychologie on Bayesian statistics. The complete call, with details, can be found here: [pdf]. Briefly:
As Bayesian statistics become part of standard analysis in psychology, the Zeitschrift für Psychologie invites papers to a topical issue highlighting Bayesian methods. We invite papers on a broad range of topics, including the benefits and limitations of Bayesian approaches to statistical inference, practical benefits of Bayesian methodologies, interesting applications of Bayesian statistics in psychology, and papers related to statistical education of psychologists from a Bayesian perspective. In addition to suggestions for full original or review articles, shorter research notes and opinion papers are also welcome. 
We invite scholars from various areas of scholarship, including but not limited to psychology, statistics, philosophy, and mathematics, to submit their abstracts on potential papers.
Abstracts are due at the end of July. Critiques and articles about the history of Bayesian statistics are also welcome.

Sunday, May 10, 2015

Visualizing statistical distributions with javascript

For the past few years, I've been developing and using a library I created that allows me to easily generate visualizations of statistical distributions for teaching. One can specify a distribution along with a parametrization, and the library sees it and generates a table containing all the distributions, which gives links to interactive plots that allow anyone to see how changing the parameters affects the distribution. In addition, clicking on the plot allows finding areas under the distribution. Users can switch between PDF and CDF views. I've now opened the code on github.

Monday, April 20, 2015

The fallacy of placing confidence in confidence intervals (version 2)

I, with my coathors, have submitted a new draft of our paper "The fallacy of placing confidence in confidence intervals". This paper is substantially modified from its previous incarnation. Here is the main argument:
"[C]onfidence intervals may not be used as suggested by modern proponents because this usage is not justified by confidence interval theory. If used in the way CI proponents suggest, some CIs will provide severely misleading inferences for the given data; other CIs will not. Because such considerations are outside of CI theory, developers of CIs do not test them, and it is therefore often not known whether a given CI yields a reasonable inference or not. For this reason, we believe that appeal to CI theory is redundant in the best cases, when inferences can be justified outside CI theory, and unwise in the worst cases, when they cannot."
The document, source code, and all supplementary material is available here on github.

Friday, April 17, 2015

Guidelines for reporting confidence intervals

I'm working on a manuscript on confidence intervals, and I thought I'd share a draft section on the reporting of confidence intervals. The paper has several demonstrations of how CIs may, or may not, offer quality inferences, and how they can differ markedly from credible intervals, even ones with so-called "non-informative" priors.

Friday, April 10, 2015

All about that "bias, bias, bias" (it's no trouble)

At some point, everyone who fiddles around with Bayes factors with point nulls notices something that, at first blush, seems strange: small effect sizes seem “biased” toward the null hypothesis. In null hypothesis significance testing, power simply increases when you change the true effect size. With Bayes factors, there is a non-monotonicity where increasing the sample size will slightly increase the degree to which a small effect size favors the null, then the small effect size becomes evidence for the alternative. I recall puzzling with this with Jeff Rouder years ago when drafting our 2009 paper on Bayesian t tests.

Thursday, April 9, 2015

Some thoughts on replication

In a recent blog post, Simine Vazire discusses the problem with the logic of requiring replicators to explain when they reach different conclusions to the original authors. She frames it, correctly, it as asking people to over-interpret random noise. Vazire identifies the issue as a problem with our thinking: that we under-estimate randomness. I'd like to explore other ways in which our biases interferes with clear thinking about replication, and perhaps suggest some ways we can clarify it.

I suggest two ways in which we fool ourselves in thinking about replication: the concept of "replication" is unnecessarily asymmetric and an example of overly-linear thinking, and lack of distinction in practice causing a lack of distinction in theory.

My favorite Neyman passage: on confidence intervals

I've been doing a lot of reading on confidence interval theory. Some of the reading is more interesting than others. There is one passage from Neyman's (1952) book "Lectures and Conferences on Mathematical Statistics and Probability" (available here) that stands above the rest in terms of clarity, style, and humor. I had not read this before the last draft of our confidence interval paper, but for those of you who have read it, you'll recognize that this is the style I was going for. Maybe you have to be Jerzy Neyman to get away with it.

Neyman gets bonus points for the footnote suggesting the "eminent", "elderly" boss is so obtuse (a reference to Fisher?) and that the young frequentists should be "remind[ed] of the glory" of being burned at the stake. This is just absolutely fantastic writing. I hope you enjoy it as much as I did.

Sunday, March 29, 2015

The TES Challenge to Greg Francis

This post is a follow-up to my previous post, “Statistical alchemy and the 'test for excess significance'”. In the comments on that post, Greg Francis objected to my points about the Test for Excess Significance. I laid out a challenge in which I would use simulation to demonstrate these points. Greg Francis agreed to the details; this post is about the results of the simulations (with links to the code, etc.)

Saturday, March 28, 2015

Two things to stop saying about null hypotheses

There is a currently fashionable way of describing Bayes factors that resonates with experimental psychologists. I hear it often, particularly as a way to describe a particular use of Bayes factors. For example, one might say, “I needed to prove the null, so I used a Bayes factor,” or “Bayes factors are great because with them, you can prove the null.” I understand the motivation behind this sort of language but please: stop saying one can “prove the null” with Bayes factors.

I also often hear other people say “but the null is never true.” I'd like to explain why we should avoid saying both of these things.

Monday, March 23, 2015

BayesFactor updated to version 0.9.11-1

The BayesFactor package has been updated to version 0.9.11-1. The changes are:

  CHANGES IN BayesFactor VERSION 0.9.11-1

  * Fixed memory bug causing importance sampling to fail.

  CHANGES IN BayesFactor VERSION 0.9.11

  * Added support for prior/posterior odds and probabilities. See the new vignette for details.
  * Added approximation for t test in case of large t
  * Made some error messages clearer
  * Use callbacks at least once in all cases
  * Fix bug preventing continuous interactions from showing in regression Gibbs sampler
  * Removed unexported function oneWayAOV.Gibbs(), and related C functions, due to redundancy
  * gMap from model.matrix is now 0-indexed vector (for compatibility with C functions)
  * substantial changes to backend, to Rcpp and RcppEigen for speed
  * removed redundant struc argument from nWayAOV (use gMap instead)

Statistical alchemy and the "test for excess significance"

[This post is based largely on my 2013 article for Journal of Mathematical Psychology; see the other articles in that special issue as well for more critiques.]

When I tell people that my primary area of research is statistical methods, one of the reactions I often encounter from people untrained in statistics is that “you can prove anything with statistics.” Of course, this rankles, first because it isn't true (unless you use a very strange definition of prove) and second because I've spent years learning the limitations of statistics, and there are many limitations. These limitations exist, however, in the context of enormous successes. In the sciences, the field of statistics rightly has a place of honor.

This success is evidenced by the great number of scientific arguments that are supported by statistical methods. Not all statistical arguments are created equal, of course. But the respect with which statistics is viewed has the unfortunate downside that a statistical argument can apparently turn a leaden hunch into a golden “truth”. This post is about such statistical alchemy.

Monday, March 9, 2015

The frequentist case against the significance test, part 2

The significance test is perhaps the most used statistical procedure in the world, though has never been without its detractors. This is the second of two posts exploring Neyman's frequentist arguments against the significance test; if you have not read Part 1, you should do so before continuing (“The frequentist case against the significance test, part 1”).

The frequentist case against the significance test, part 1

It is unfortunate that today, we tend to think about statistical theory in terms of Bayesianism vs frequentism. Modern practice is a blend of Fisher's and Neyman's ideas, with the characteristics of the blend driven by convenience rather than principle. Significance tests are lumped in as a “frequentist” technique by Bayesians in an unfortunate rhetorical shorthand.

In recent years, the significance test has been critiqued on several grounds, but often these critiques are offered from Bayesian or pragmatic grounds. In a two-part post, I will outline the frequentist case developed by Jerzy Neyman against the null hypothesis significance test.

Thursday, March 5, 2015

How to shoot yourself in the foot with various statistical philosophies

I've long been a fan of "How to shoot yourself in the foot" jokes. Having shot myself in the foot with different programming languages -- particularly with C -- I was thinking about how one might shoot oneself in the foot with various statistical approaches. So, here we go...

Monday, March 2, 2015

At the APS Observer: a profile of JASP

The APS Observer has just published a profile of JASP, a graphical user interface designed to make statistics easier. It includes Bayesian procedures by means of the R and the BayesFactor package. From the article:
 JASP distinguishes itself from SPSS by being as simple, intuitive, and approachable as possible, and by making accessible some of the latest developments in Bayesian analyses. At time of writing, JASP version 0.6 implements the following analysis tools in both their classical and Bayesian manifestations:
  • Descriptive statistics
  • t tests
  • Independent samples ANOVA
  • Repeated measures ANOVA
  • Correlation
  • Linear regression
  • Contingency tables
Read more at the APS observer.

Sunday, March 1, 2015

To Beware or To Embrace The Prior

In this guest post, Jeff Rouder reacts to two recent comments skeptical of Bayesian statistics, and describes the importance of the prior in Bayesian statistics. In short: the prior gives a Bayesian model the power to predict data, and prediction is what allows the evaluation of evidence. Far from being a liability, Bayesian priors are what make Bayesian statistics useful to science.

Tuesday, February 10, 2015

BayesFactorExtras: a sneak preview

Felix Schönbrodt and I have been working on an R package called BayesFactorExtras. This package is designed to work with the BayesFactor package, providing features beyond the core BayesFactor functionality. Currently in the package are:
  1. Sequential Bayes factor plots for visualization of how the Bayes factor changes as data come in: seqBFplot()
  2. Ability to embed R objects directly into HTML reports for reproducible, sharable science:  createDownloadURI()
  3. Interactive BayesFactor objects in HTML reports;  just print the object in a knitr document.
  4. Interactive MCMC objects in HTML reports; just print the object in a knitr document.
All of these are pretty neat, but I thought I'd give a sneak preview of #4. To see how it works, click here to play with the document on Rpubs!

I anticipate releasing this to CRAN soon.

Saturday, February 7, 2015

On making a Bayesian omelet

My colleagues Eric-Jan Wagenmakers and Jeff Rouder and I have a new manuscript in which we respond to Hoijtink, van Kooten, and Hulsker's in press manuscript Why Bayesian Psychologists Should Change the Way they Use the Bayes Factor. They suggest a method for "calibrating" Bayes factor using error rates. We show that this method is fatally flawed, but also along the way we describe how we think about the subjective properties of the priors we use in our Bayes factors:

"...a particular researcher's subjective prior is of limited use in the context of a public scientific discussion. Statistical analysis is often used as part of an argument. Wielding a fully personal, subjective prior and concluding 'If you were me, you would believe this' might be useful in some contexts, but in others it is less useful. In the context of a scientific argument, it is much more useful to have priors that approximate what a reasonable, but somewhat-removed researcher would have in the situation. One could call this a 'consensus prior' approach. The need for broadly applicable arguments is not a unique property of statistics; it applies to all scientific arguments. We do not argue to convince ourselves; we should therefore make use of statistical arguments that are not pegged to our own beliefs...
It should now be obvious how we make our 'Bayesian omelet'; we break the eggs and cook the omelet for others in the hopes that it is something like what they would choose for themselves. With the right choice of ingredients, we think our Bayesian omelet can satisfy most people; others are free to make their own, and we would be happy to help them if we can. "

Our completely open, reproducible manuscript --- “Calibrated” Bayes factors should not be used: a reply to Hoijtink, van Kooten, and Hulsker --- along with a supplement and R code, is available on github (with DOI!).

Tuesday, February 3, 2015

BayesFactor version 0.9.10 released to CRAN

If you're running Solaris (yes, all zero of you) you'll have to wait for version 0.9.10-1, due to a small issue preventing the Solaris package from building on CRAN. The rest of you can pick up the updated version on CRAN today.

See below the fold for changes.

Friday, January 30, 2015

On verbal categories for the interpretation of Bayes factors

As Bayesian analysis is becoming more popular, adopters of Bayesian statistics have had to consider new issues that they did not before. What is makes “good” prior? How do I interpret a posterior? What Bayes factor is “big enough”? Although the theoretical arguments for the use of Bayesian statistics are very strong, new and unfamiliar ideas can cause uncertainty in new adopters. Compared to the cozy certainty of \(p<.05\), Bayesian statistics requires more care and attention. In theory, this is no problem at all. But as Yogi Berra said, "In theory there is no difference between theory and practice. In practice there is."

In this post, I discuss the the use of verbal labels for magnitudes of Bayes factors. In short, I don't like them, and think they are unnecessary.

Sunday, January 18, 2015

Multiple Comparisons with BayesFactor, Part 2 - order restrictions

In my previous post, I described how to do multiple comparisons using the BayesFactor package. Part 1 concentrated on testing equality constraints among effects: for instance, that the the effects of two factor levels are equal, while leaving the third free to be different. In this second part, I will describe how to test order restrictions on factor level effects. This post will be a little more involved than the previous one, because BayesFactor does not currently do order restrictions automatically.

Again, I will note that these methods are only meant to be used for pre-planned comparisons. They should not be used for post hoc comparisons.

Saturday, January 17, 2015

Multiple Comparisons with BayesFactor, Part 1

One of the most frequently-asked questions about the BayesFactor package is how to do multiple comparisons; that is, given that some effect exists across factor levels or means, how can we test whether two specific effects are unequal. In the next two posts, I'll explain how this can be done in two cases: in Part 1, I'll cover tests for equality, and in Part 2 I'll cover tests for specific order-restrictions.

Before we start, I will note that these methods are only meant to be used for pre-planned comparisons. They should not be used for post hoc comparisons.