A funnel plot is a commonly-used meta-analytic technique for the detection of bias in a subset of the scientific literature. The basic thinking is that if a literature is unbiased, the average estimates of an effect should not depend on the sample size (or some other measure of the "precision" of a study). For a given sample size, estimates of the effect size should be spread around the true effect size, with this spread decreasing as sample size gets larger.
Publication bias, which is often assumed to manifest itself as 1) a tendency for statistically significant results to be published, and 2) a tendency for researchers to publish effects consistent with their theoretical outlook, will result in asymmetric funnel plots. Read this Neuroskeptic post about a paper by Shanks and colleagues for an example how asymmetric funnel plots are used to argue for publication bias. Notice that the plots use a standardized effect size on the x axis.
A (not so) hypothetical paradigm
Now suppose this same paradigm is used across many labs, with only variation in sample sizes. Each lab reports the standard statistics: the mean difference in RTs across participants, its standard error, and the t statistic. A skeptic comes along, collects all the statistics across all the papers, and computes Hedge's g standardized effect size (a variation on the standardized difference score) from the t statistic. They produce the funnel plot shown below by plotting the sample size1 (number of participants) against the standardized effect size:
Why is the funnel plot asymmetric? In all studies, the total number of trials performed was approximately the same: 2000 trials. The way these broke down across participants was different. Some studies had 100 trials per condition and 10 participants; others, 10 trials per condition and 100 participants. The standard deviation of the difference scores around their mean is a function of the number of trials performed per participant. When the number of trials is high, the standardized effect size is high, just as discussed in the previous blog post. But here, because the total amount of "effort" per study is conserved (that is, all studies have the same number of total trials), the studies with larger numbers of trials per participant have a smaller number of participants. The funnel plot therefore looks problematic, but it is an artifact.
One wonders if this Cross Validated query was related to this artifact.
Creating a funnel plot from the raw effect sizes removes the asymmetry; a funnel plot with the standard error on the y axis also does so.
I suspect there are other artifacts one could generate using standardized effect sizes in a meta-analysis2. How can we keep from getting fooled? In some cases, perhaps the correction I mentioned in the previous post might be of use. Since a funnel plot is often used for detecting problematic bias in a literature rather than estimating the effect size, the fact that there is no "true" effect size is not problematic.
For future research, data sharing and reporting of different effect size measures will help. Modifications of Cohen's d and Hedge's g exist which will reduce this problem (see "Computing d and g from studies that use pre-post scores or matched groups", for instance), but these modified statistics cannot be computed from typically-reported statistics. The fact that we need statistics that are not typically reported in order to perform reasonable meta-analyses raises the question of whether current reporting practices really allow a cumulative science.
2Sterne et al (2011) note minor asymmetries caused by a correlation between an effect and a standard error, as can be caused in estimation of extreme proportions or similar parameters, but nothing as dramatic or fundamental as shown here. Their asymmetries are mostly problematic for asymmetry tests, which can pick up minor asymmetries with larger samples.