The “Weisburd paradox” refers to the finding by Weisburd, Petrosino and Mason who reviewed the literature of experimental studies in criminology and found that increasing the sample size did not lead to increased statistical power. While this paradox has perhaps not achieved great attention in the literature so far, the study was replicated last year by Nelson, Wooditch and Dario in Journal of Experimental Criminology confirming the phenomenon.
The empirical finding that larger sample size does not increase power is based on calculating “achieved power”. This is supposed to shed light on what the present study can and cannot achieve (see e.g. here). “Achieved power” is calculated in the same way as conventional power calculations, but instead of using the assumed effect size, one uses the estimated effect in the same study.
Statistical power refers to the probability of correctly rejecting the null hypothesis, based on assumptions about the size of the effect (usually based on previous studies or other substantive reasons). By increasing the sample size, the standard error gets smaller and this increases the probability of rejecting the null hypothesis if there is a true effect. Usually, power calculations are used to determine the necessary sample size as there is no point of carrying out a study if one cannot detect anything anyway. So, one needs to ensure sufficient statistical power when planning a study.
But using the estimated effect size in the power calculations gives a slightly different interpretation. “Achieved power” would be the probability of rejecting the null hypothesis, based on the assumption that the population effect is exactly equal to the observed sample effect. I would say this is rarely a quantity of interest since one has already either rejected or kept the null hypothesis… Without any reference to external information about true effect sizes, post-hoc power calculations brings nothing new to the table beyond what the point estimate and standard error already provides.
Larger “achieved power” imply larger estimated effect size, so let’s talk about that. The Weisburd paradox is that smaller studies tend to have larger estimated effects than larger studies. While Nelson et al discuss several reasons for why that might be, they did not put much weight on what I would consider the prime suspect: a lot of noise combined with the “significance filter” to get published. If there is a significant effect in a small study, the point estimate needs to be large. If significant findings are easier to publish, then the published findings from small studies would be larger on average. (In addition, researchers have incentives to find significant effects to get published and might get tempted to do a bit of p-hacking – which makes things worse). So, the Weisburd paradox might be explained by exaggerated effect sizes.
But why care? First, I believe the danger is that such reasoning might mislead researchers to justify conducting too small studies, ending up chasing noise rather than making scientific progress. Second, researchers might give the impression that their findings are more reliable than it really is by showing that they have high post-hoc statistical power.
Just to be clear: I do not mind small studies as such, but I would like to see the findings from small studies replicated a few times before giving them much weight.
Mikko Aaltonen and I wrote a commentary to the paper by Nelson et al. and submitted it to Journal of Experimental Criminology, pointing out such problems and argued that the Weisburd paradox is not even a paradox. We were rejected. There are both good and bad reasons for this. One of the reviewers pointed out a number of points to be improved and corrected. The second reviewer was even grumpier than me and did not want to understand our points at all. When re-reading our commentary, I can see much to be improved and I also see that we might be perceived as more confrontational than intended. (I also noticed a couple of other minor errors). Maybe we should have put more work into it. You can read our manuscipt here (no corrections made). We decided not to re-write our commentary to a more general audience, so it will not appear elsewhere.
When writing this post, I did an internet search and found this paper by Andrew Gelman prepared for the Journal of Quantitative Criminology. His commentary on the Weisburd paradox is clearly much better written than ours and more interesting for a broader audience. Less grumpy as well, but many similar substantive points. I guess Gelman’s commentary should pretty much settle this issue. Kudos to Gelman. EDIT: , but also to JQC for publishing it. An updated version of Gelman’s piece is here – apparently not(!) accepted for publication yet.
The post About the Weisburd paradox appeared on The Grumpy Criminologist 2016-07-14 10:00:39 by Torbjørn.