Friday, July 29, 2016

Stop saying confidence intervals are "better" than p values

One of the common tropes one hears from advocates of confidence intervals is that they are superior, or should be preferred, to p values. In our paper "The Fallacy of Placing Confidence in Confidence Intervals", we outlined a number of interpretation problems in confidence interval theory. We did this from a mostly Bayesian perspective, but in the second section was an example that showed why, from a frequentist perspective, confidence intervals can fail. However, many people missed this because they assumed that the paper was all Bayesian advocacy. The purpose of this blog post is to expand on the frequentist example that many people missed; one doesn't have to be a Bayesian to see that confidence intervals can be less interpretable than the p values they are supposed to replace. Andrew Gelman briefly made this point previously, but I want to expand on it so that people (hopefully) more clearly understand the point.

Understanding the argument I'm going to lay out here is critical to understanding both p values and confidence intervals. As we'll see, fallacies about one or the other are what lead advocates of confidence intervals to falsely believe that CIs are "better".

p values and "surprise"

First, we must define a p value properly and understand its role in frequentist inference. The p value is the probability of obtaining a result at least as extreme as the one we observed, under some assumption about the true distribution of the data. A low p value is taken as indicating that the result observed was very extreme under the assumptions, and hence calls the assumptions into doubt. One might say that a low p value is "surprising" under the assumptions. I will not question this mode of inference here.

It is critical to keep in mind that a low p value can call an assumption into doubt, but a high p value does not "confirm" anything. This is consistent with falsificationist logic. We often see p values used in the context of null hypothesis significance testing (NHST), where a single p value is computed that indicates how extreme the data under the assumption of a null hypothesis; however, we can compute p values for any hypothesis we like. As an example, suppose we are interested in whether reading comprehension scores are affected by caffeine. We apply three different doses to N=10 people in each group in a between-subjects design, and test their reading comprehension. For the sake of the example, we assume normality, homogeneity of variance, etc. We apply a one-way ANOVA to the reading comprehension scores and obtain an F statistic of F(2,27)=8.

If we were to assume that there was no relationship between the reading scores and caffeine dose, then the resulting p value for this F statistic is p=0.002. This indicates that we would only expect F statistics as extreme as this one .2% of the time, if there were no true relationship.

The curve shows the distribution of F(2,27) statistics when the null hypothesis is true. The area under the curve to the right of the observed F statistic is the p value.
This low p value would typically be regarded as strong evidence against the null hypothesis, because -- as the graph above shows -- an F statistic as extreme as the observed on would be quite rare, if indeed there were no relationship between reading scores and caffeine.

So far, this is all first-year statistics (though it is often misunderstood). Although we typically see p values computed for a single hypothesis, there is nothing stopping us from computing it for multiple hypotheses. Suppose we are interested in the true size of the effect between reading scores and caffeine dosage. One statistic that quantifies this relationship is ω2, the proportion of the total variance in the reading scores that is "accounted for" by caffeine (see Steiger, 2004 for details). We won't get into the details of how this is computed; we need only know that:

  • When ω2=0, there is no relationship between caffeine and reading scores. All variance is error; that is, knowing someone's reading score does not give any information about which dose group they were in.
  • When ω2=1, there is the strongest possible relationship between caffeine and readings scores. No variance is error; that is, by knowing someone's reading score one can know with certainty which does group they were in.
  • As ωgets larger, larger and larger F statistics are predicted. 
We have computed the p value under the assumption that ω2=0, but what about all other ωvalues? Try this shiny app to find the predicted distribution of F statistics, and hence p values, for other values of ω2. Try to find the value of ωthat would yield a p value of exactly 0.05; it should be about ω2=0.108. 

A Shiny app for finding p values in a one-way ANOVA with three groups.

All values of ωless than 0.108 yield p values of less than 0.05. If we designate p<0.05 as "surprising" p values, then F=8 would be surprising under the assumption of any value of ωbetween 0 and 0.108.

Using the Shiny app, we can see that a F=8 yields a right-tailed p value of about 0.05 when ω2 is approximately 0.108. 


Notice that the p values we've computed thus far are "right-tailed" p values; that is, "extreme" is defined as "too big". We can also ask about whether the F statistic we've found is extreme in the other direction: that is, is it "too small". A p value used to indicate whether the F value is too small is called a "left-tailed" p value. Using the Shiny app, one can work out the value of ω2 such that F=8 would be "surprisingly" small at the p=0.05 level; that value is ω2=0.523. Under any true value of ωgreater than 0.523, F=8 would be surprisingly small.

Using the Shiny app, we can see that a F=8 yields a left-tailed p value of about 0.05 when ω2 is approximately 0.523.

  • If 0 ≤ ω≤ 0.108, the observed F statistic would be surprisingly large (that is, the right-tailed p ≤ 0.05)
  • If 0.523 ≤ ω≤ 1, the observed F statistic would be surprisingly small (that is, the left-tailed p ≤ 0.05)
  • If 0.108 ≤ ω0.523, the observed F statistic would not be surprisingly large or small. 

Critically, we've used p values to make all of these statements. The p values tell us whether values would be "surprisingly extreme", under particular assumptions; p values allow us, under frequentist logic, to rule out true values of ω2, but not to rule them in.

p values and confidence intervals

Many people are aware of the relationship between p values and confidence intervals. A typical X% (two-tailed) confidence interval contains all parameter values such that neither one-sided p values are less than (1-X/100)/2. That sounds complicated, but it isn't; for a 90% confidence interval, we need just need all the values for which the observed data would not be "too surprising" (p<0.05, for one of the two-sided tests).

We've already computed the 90% confidence interval for ωin our example; for all values in [0.108, 0.523], the p value for both one sided tests is p>0.05. From each of two-sided tests we get an error rate of 0.05, and hence the confidence coefficient is 100 times 1 - (0.05 + 0.05) = 90%.

How can we interpret the confidence interval? Confidence interval advocates would have us believe that the interval [0.108, 0.523] gives "plausible" or "likely" values for the parameters, and that the width of this interval tells us the precision of our estimate. But remember how the CI was computed: using p values. We know that nonsignificant high p values do not rule in parameter values as plausible; rather, the values outside the interval have been ruled out, due to the fact that if those were the true values, the observed data would be surprising.

So rather than thinking of the CI as values that are "ruled in" as "plausible" or "likely" by the data, we should rather (from a frequentist perspective, at least) think of the confidence interval as values that have not yet been ruled out by a significance test.


Does this matter?

This distinction matters a great deal for understanding both p values and confidence intervals. In order to use p values in any way that approaches reasonability, we need to understand the "surprise" interpretation, and we need to realise that we can compute p values for many hypotheses, not just the null hypothesis. In order to interpret confidence intervals well, we need to understand the "fallacy of acceptance": Just because a value is in the CI, doesn't mean it is plausible; it only means that it has not yet been ruled out.

To see the real consequences of this fallacy, consider what we would infer if F(2,27)=0.001 (p=0.999). Any competent data analyst would notice that there is something wrong; the means are surprisingly similar. Under the null hypothesis, when all error is due to error within the groups, we expect the means to vary. This F statistic indicates that the means are so similar that even under the null hypothesis -- where the true means are exactly the same -- we would expect more similar observed means only one time in a thousand.

In fact, the F statistic is so small that under all values of ω, the left-tailed p value is at most 0.001. Why? Because ωcan't be any lower than 0, and this represents the null hypothesis. If we built a 90% confidence interval, it would be empty because there are no values of ωthat yield p>0.05. For all true values of ω, the observed data are "surprising". Now this presents no particular problem for an interpretation of p values that rests solely on their relationship with p values. But note that the very high p value tells us more than the confidence interval; the CI depends on the confidence, and is simply empty. The p value and the F statistic have the information we want; they tells us that the means are much more similar than we would typically expect under any hypothesis. A competent data analyst would, at this point, check the procedure or data for problems. The entire model is suspect.

But what does this mean for a confidence interval advocate who is invested in the (incorrect) interpretation of the CI in terms of "plausible values" or "precision"? Consider Steiger (2004), who suggests replacing a missing bound with "0" in the CI for ω2. This is an awful suggestion. In the example above with F=0.001, this would imply that the confidence interval includes a single value, 0. But the observed data F=0.001 would be very surprising if ω0. Under frequentist logic, the value -- and all other values -- should be ruled out. Moreover, a CI of (0) is infinitesimally thin. Steiger admits that this obviously does not imply infinite precision, but neither Steiger nor any other CI advocate give a formal reason why CIs must, in general have an interpretation in terms of precision. When the interpretation obviously fails, this should make us doubt whether the interpretation was correct in the first place. The p value tells the story much better than the CI, without encouraging us to fall into fallacies of acceptance or precision.

Where to go from here?

It is often claimed that confidence interval is more informative than p values. This assertion is based on a flawed interpretation of confidence intervals, which we call the "likelihood" or "plausibility" fallacy, and is related to Mayo's "fallacy of acceptance". A proper interpretation of confidence intervals in, terms of the underlying significance tests, avoids this fallacy and prevents bad interpretations of the CIs, in particular when the model is suspect. The entire concept of the "confidence interval" encourages the fallacy of acceptance, and it is probably best if CIs were abandoned altogether. If one does not want to be Bayesian one option that is more useful than confidence intervals -- where all values are either rejected or not at a fixed level of significance -- is viewing curves of p values (for similar use of p value curves, see Mayo's work on "severity").
Curves of right- and left-tailed p values for the two F statistics mentioned in this post.
Consider the plot on the left above, which shows all right- and left-tailed p values for F=8. The horizontal line at p=0.05 allows us to find the 90% confidence interval. For any value of ωsuch that either the blue or red line is lower than the horizontal line, the observed data would be "surprising". It is easy to see that for p=0.05, these values are [0.108, 0.523]. The plot easily shows the necessary information without encouraging the fallacy of acceptance.

Now, consider the plot on the right. For F=0.001, however, all values of ωyield a left-tailed p value of less than 0.05, and hence F=0.001 would be "surprising". There are no values for which both the red and left lines are above p=0.05. The plot does not encourage us to believe that ωis small or 0, it also does not encourage any interpretation in terms of precision; instead, it shows that all values are suspect.

The answer to fallacious interpretations of p values is not to move to confidence intervals; confidence intervals only encourage related fallacies, which one can find in any confidence interval advocacy paper. If we wish to rid people of fallacies involving p values, more p values are needed, not fewer. Confidence intervals are not "better" than p values. The only way to interpret CIs reasonably is in terms of p values, and considering entire p value curves enables us to jettison the reliance on an arbitrary confidence coefficient, and helps us avoid fallacies.


62 comments:

  1. Thank you for sharing such a nice and interesting blog with us. Hope it might be much useful for us. keep on updating...!!
    seo company in india
    digital marketing company in india

    ReplyDelete
  2. A very detailed information which is a lot more helpful for many. Nice Blog. Freelance Developer

    ReplyDelete
  3. Impact i Training London offering professional IT Training London by highly skilled teachers in UK for best IT Training London please Call 02086178466.

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Very nice information…A virtual assistant can help you grow your business by handling your administrative support tasks. You can outsource various tasks like accounting, virtual office assistant,advertising, clerical, administrative, answering phone calls, internet research, data entry and technical support tasks to a virtual assistant. And the best part is that you can relax and enjoy your vacation or holidays while all your works are done by your virtual assistant.

    ReplyDelete
  6. Are you still working with BayesFactor? You deleted your facebook page. Cheers.

    ReplyDelete
  7. Excellent Article ...thank u for sharing, such a valuable content Learners to get good knowledge after read this article..
    Data Science Training in Chennai

    ReplyDelete

  8. Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
    sap strategic consulting and services

    ReplyDelete
  9. I read this article. I think You put a lot of effort to create this article. I appreciate your work.
    Dissertation Writing Services

    ReplyDelete
  10. Its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.

    salesforce admin training in hyderabad

    ReplyDelete
  11. COEPD LLC- Center of Excellence for Professional Development is the most trusted online training platform to global participants. We are primarily a community of Business Analysts who have taken the initiative to facilitate professionals of IT or Non IT background with the finest quality training. Our trainings are delivered through interactive mode with illustrative scenarios, activities and case studies to help learners start a successful career. We impart knowledge keeping in view of the challenging situations individuals will face in the real time, so that they can handle their job deliverables with at most confidence.

    http://coepd.us/

    ReplyDelete
  12. We at Coepd declared Data Science Internship Programs (Self sponsored) for professionals who want to have hands on experience. We are providing this program in alliance with IT Companies in COEPD Hyderabad premises. This program is dedicated to our unwavering participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Data Science discipline. This internship is designed to ensure that in addition to gaining the requisite theoretical knowledge, the readers gain sufficient hands-on practice and practical know-how to master the nitty-gritty of the Data Science profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

    http://www.coepd.com/AnalyticsInternship.html

    ReplyDelete
  13. Thank you for sharing such an informative and knowledgeable post. I was searching for this since many days. Keep sharing on.

    Assignment Help
    help with assignments
    help with assignment
    my assignment help

    ReplyDelete
  14. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts, have a nice weekend!

    Data Science Training in Chennai

    ReplyDelete
  15. Really great post, Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up!
    Click here:
    angularjs training in chennai
    Click here:
    angularjs2 training in chennai

    ReplyDelete
  16. Very good brief and this post helped me alot. Say thank you I searching for your facts. Thanks for sharing with us!
    Click here:
    Microsoft azure training in chennai
    Click here:
    Microsoft azure training in online

    ReplyDelete
  17. Down2App is an ultimate platform to download premium themes, website templates, scripts, Softwares, Down App Review and Get Reviews about that for absolutely FREE!

    ReplyDelete
  18. The knowledge of technology you have been sharing thorough this post is very much helpful to develop new idea. here by i also want to share this.
    Devops training in sholinganallur

    ReplyDelete
  19. Thanks for this content on the blog.

    Paula

    ReplyDelete
  20. Irrespective of receiving daily oral or future injectable depot therapies, these require health care visits for medication and monitoring of safety and response. If patients are treated early enough, before a lot of immune system damage has occurred, life expectancy is close to normal, as long as they remain on successful treatment. However, when patients stop therapy, virus rebounds to high levels in most patients, sometimes associated with severe illness because i have gone through this and even an increased risk of death. The aim of “cure”is ongoing but i still do believe my government made millions of ARV drugs instead of finding a cure. for ongoing therapy and monitoring. ARV alone cannot cure HIV as among the cells that are infected are very long-living CD4 memory cells and possibly other cells that act as long-term reservoirs. HIV can hide in these cells without being detected by the body’s immune system. Therefore even when ART completely blocks subsequent rounds of infection of cells, reservoirs that have been infected before therapy initiation persist and from these reservoirs HIV rebounds if therapy is stopped. “Cure” could either mean an eradication cure, which means to completely rid the body of reservoir virus or a functional HIV cure, where HIV may remain in reservoir cells but rebound to high levels is prevented after therapy interruption.Dr Itua Herbal Medicine makes me believes there is a hope for people suffering from,Parkinson's disease,Schizophrenia,Cancer,Scoliosis,Fibromyalgia,Fluoroquinolone Toxicity
    Syndrome Fibrodysplasia Ossificans Progressiva.Fatal Familial Insomnia Factor V Leiden Mutation ,Epilepsy Dupuytren's disease,Desmoplastic small-round-cell tumor Diabetes ,Coeliac disease,Creutzfeldt–Jakob disease,Cerebral Amyloid Angiopathy, Ataxia,Arthritis,Amyotrophic Lateral Sclerosis,Alzheimer's disease,Adrenocortical carcinoma.Asthma,Allergic diseases.Hiv_ Aids,Herpes,Inflammatory bowel disease ,Copd,Diabetes,Hepatitis,I read about him online how he cure Tasha and Tara so i contacted him on drituaherbalcenter@gmail.com even talked on whatsapps +2348149277967 believe me it was easy i drank his herbal medicine for two weeks and i was cured just like that isn't Dr Itua a wonder man? Yes he is! I thank him so much so i will advise if you are suffering from one of those diseases Pls do contact him he's a nice man.

    ReplyDelete
  21. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    java training in chennai | java training in bangalore

    java online training | java training in pune

    ReplyDelete
  22. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Data Science training in Chennai | Data science training in bangalore
    Data science training in pune | Data science online training
    Data Science Interview questions and answers

    ReplyDelete
  23. Your good knowledge and kindness in playing with all the pieces were very useful. I don’t know what I would have done if I had not encountered such a step like this.
    advanced excel training in bangalore

    ReplyDelete
  24. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Java training in Marathahalli | Java training in Btm layout

    Java training in Marathahalli | Java training in Btm layout

    ReplyDelete
  25. My life is beautiful thanks to you, Mein Helfer. Lord Jesus in my life as a candle light in the darkness. You showed me the meaning of faith with your words. I know that even when I cried all day thinking about how to recover, you were not sleeping, you were dear to me. I contacted the herbal center Dr Itua, who lived in West Africa. A friend of mine here in Hamburg is also from Africa. She told me about African herbs but I was nervous. I am very afraid when it comes to Africa because I heard many terrible things about them because of my Christianity. god for direction, take a bold step and get in touch with him in the email and then move to WhatsApp, he asked me if I can come for treatment or I want a delivery, I told him I wanted to know him I buy ticket in 2 ways to Africa To meet Dr. Itua, I went there and I was speechless from the people I saw there. Patent, sick people. Itua is a god sent to the world, I told my pastor about what I am doing, Pastor Bill Scheer. We have a real battle beautifully with Spirit and Flesh. Adoration that same night. He prayed for me and asked me to lead. I spent 2 weeks and 2 days in Africa at Dr Itua Herbal Home. After the treatment, he asked me to meet his nurse for the HIV test when I did it. It was negative, I asked my friend to take me to another nearby hospital when I arrived, it was negative. I was overwhite with the result, but happy inside of me. We went with Dr. Itua, I thank him but I explain that I do not have enough to show him my appreciation, that he understands my situation, but I promise that he will testify about his good work. Thank God for my dear friend, Emma, I know I could be reading this now, I want to thank you. And many thanks to Dr. Itua Herbal Center. He gave me his calendar that I put on my wall in my house. Dr. Itua can also cure the following diseases ... Cancer, HIV, Herpes, Hepatitis B, Inflammatory Liver, Diabetis, Fribroid,Parkinson's disease,Inflammatory bowel disease ,Fibromyalgia, recover your ex. You can contact him by email or whatsapp, @ .. drituaherbalcenter@gmail.com, phone number .. + 2348149277967 .. He is a good doctor, talk to him kindly. I'm sure he will also listen to you.

    ReplyDelete
  26. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Data Science training in Chennai | Data science training in bangalore

    Data science training in pune | Data science online training

    Data Science Interview questions and answers

    ReplyDelete
  27. You need to find a good Job? Then you need to achieve necessary requirements, Knowledge and qualifications. For that you have to follow many courses and get experiences. Therefore, Before finding a job you need to clear idea about what you have and what you need to improve. In Sri Lanka the best place of finding a job is Dinamina Jobs website. That website has many jobs related many job categories. and also they have many professional courses also. Then Dinamina Jobs website is the best place to find a good job vacancies and Professional courses advertisements.

    ReplyDelete
  28. Good job in presenting the correct content with the clear explanation. The content looks real with valid information. Good Work

    DevOps is currently a popular model currently organizations all over the world moving towards to it. Your post gave a clear idea about knowing the DevOps model and its importance.

    Good to learn about DevOps at this time.


    devops training in chennai | devops training in chennai with placement | devops training in chennai omr | devops training in velachery | devops training in chennai tambaram | devops institutes in chennai | devops certification in chennai | trending technologies list 2018

    ReplyDelete
  29. Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us. Aviation Courses in Chennai | Best Aviation Academy in Chennai
    Aviation Academy in Chennai | Aviation Training in Chennai | Aviation Institute in Chennai

    ReplyDelete
  30. Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us. Aviation Courses in Chennai | Best Aviation Academy in Chennai
    Aviation Academy in Chennai | Aviation Training in Chennai | Aviation Institute in Chennai

    ReplyDelete
  31. Awwsome informative blog ,Very good information thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us. Aviation Courses in Chennai | Best Aviation Academy in Chennai | Aviation Academy in Chennai | Aviation Training in Chennai | Aviation Institute in Chennai

    ReplyDelete
  32. Thanks for your sharing such a useful information. this was really helpful to me

    Guest posting sites
    Technology

    ReplyDelete
  33. Thanks for sharing,this blog makes me to learn new thinks.
    interesting to read and understand.keep updating it.

    Education
    Technology

    ReplyDelete