Category: Rejections

About the Weisburd paradox

The “Weisburd paradox” refers to the finding by Weisburd, Petrosino and Mason who reviewed the literature of experimental studies in criminology and found that increasing the sample size did not lead to increased statistical power. While this paradox has perhaps not achieved great attention in the literature so far, the study was replicated last year by Nelson, Wooditch and Dario in Journal of Experimental Criminology confirming the phenomenon.

The empirical finding that larger sample size does not increase power is based on calculating “achieved power”. This is supposed to shed light on what the present study can and cannot achieve (see e.g. here). “Achieved power” is calculated in the same way as conventional power calculations, but instead of using the assumed effect size, one uses the estimated effect in the same study.

Statistical power refers to the probability of correctly rejecting the null hypothesis, based on assumptions about the size of the effect (usually based on previous studies or other substantive reasons). By increasing the sample size, the standard error gets smaller and this increases the probability of rejecting the null hypothesis if there is a true effect. Usually, power calculations are used to determine the necessary sample size as there is no point of carrying out a study if one cannot detect anything anyway. So, one needs to ensure sufficient statistical power when planning a study.

But using the estimated effect size in the power calculations gives a slightly different interpretation. “Achieved power” would be the probability of rejecting the null hypothesis, based on the assumption that the population effect is exactly equal to the observed sample effect. I would say this is rarely a quantity of interest since one has already either rejected or kept the null hypothesis… Without any reference to external information about true effect sizes, post-hoc power calculations brings nothing new to the table beyond what the point estimate and standard error already provides.

Larger “achieved power” imply larger estimated effect size, so let’s talk about that. The Weisburd paradox is that smaller studies tend to have larger estimated effects than larger studies. While Nelson et al discuss several reasons for why that might be, they did not put much weight on what I would consider the prime suspect: a lot of noise combined with the “significance filter” to get published. If there is a significant effect in a small study, the point estimate needs to be large. If significant findings are easier to publish, then the published findings from small studies would be larger on average. (In addition, researchers have incentives to find significant effects to get published and might get tempted to do a bit of p-hacking – which makes things worse). So, the Weisburd paradox might be explained by exaggerated effect sizes.

But why care? First, I believe the danger is that such reasoning might mislead researchers to justify conducting too small studies, ending up chasing noise rather than making scientific progress. Second, researchers might give the impression that their findings are more reliable than it really is by showing that they have high post-hoc statistical power.

Just to be clear: I do not mind small studies as such, but I would like to see the findings from small studies replicated a few times before giving them much weight.

Mikko Aaltonen and I wrote a commentary to the paper by Nelson et al. and submitted it to Journal of Experimental Criminology, pointing out such problems and argued that the Weisburd paradox is not even a paradox. We were rejected. There are both good and bad reasons for this. One of the reviewers pointed out a number of points to be improved and corrected. The second reviewer was even grumpier than me and did not ~~want to~~ understand our points at all. When re-reading our commentary, I can see much to be improved and I also see that we might be perceived as more confrontational than intended. (I also noticed a couple of other minor errors). Maybe we should have put more work into it. You can read our manuscipt here (no corrections made). We decided not to re-write our commentary to a more general audience, so it will not appear elsewhere.

When writing this post, I did an internet search and found this paper by Andrew Gelman prepared for the Journal of Quantitative Criminology. His commentary on the Weisburd paradox is clearly much better written than ours and more interesting for a broader audience. Less grumpy as well, but many similar substantive points. I guess Gelman’s commentary should pretty much settle this issue. Kudos to Gelman. EDIT: ~~, but also to JQC for publishing it.~~ An updated version of Gelman’s piece is here – apparently not(!) accepted for publication yet.

The post About the Weisburd paradox appeared on The Grumpy Criminologist 2016-07-14 10:00:39 by Torbjørn.

Testing typological theories using GBTM?

As I mentioned in the post yesterday, I think the debates about group-based trajectory modeling have some unresolved issues. For this reason, I submitted a commentary to Journal of Research in Crime and Delinquency. I had two reasons for doing so. First, I think Nagin mischaracterized his critics, and I believe his essay was a willful attempt to avoid serious criticism by ignoring serious arguments. (Maybe I could have been less outspoken about that). But after all, he has not addressed the actual argument I (and others) have put forward. I can only interpret this as an attempt to avoid discussing the substantive matter by keeping silent, and now subtly dismissing the whole thing. If Nagin find it worthwhile saying his critics have misunderstood, he should also bother to point out how. So far, he has done no such thing.

Second, I actually think there is a need to clarify whether GBTM can test for the presence of groups or not. If the advocates of GBTM had been clear about this, it would obviously not have been needed. There is no doubt that Nagin and others have been clear that GBTM can – or maybe even should – be interpreted as an approximation to a continuous distribution. There is no disagreement on that point. But they have also given the impression that one can identify meaningful real groups in the data by way of GBTM. They have not been clear on what this really means or under what conditions this can be done. A clarification is in order, since it is clear in the literature that findings from GBTM analyses have been interpreted as giving very strong evidence to a certain typological theory (see e.g. here). I have claimed this empirical evidence is weak and largely based on overinterpretation of empirical studies using GBTM (see, here and here). It would be helpful if Nagin could clarify the strength of this evidence.

So I wrote a commentary and submitted it to The Journal of Research in Crime and Delinquency. (See the full commentary here). According to the letter from the editor, it was rejected because:

Language at the top of page 2 in your comment underscores a fundamental misunderstanding and misreading of Nagin’s work.
(See the full rejection letter here).

Well, maybe I should have put things more politely, but I still believe my arguments are right. I can understand that there might be good editorial reason for why not having another debate about GBTM in the journal, but I am not impressed with the reason given. My fundamental misunderstanding is revealed (on the top of page 2) where I point out that Nagin himself is responsible for some of the confusion regarding the interpretation of the groups. I do so with clear references, so you can decide for yourself whether these are misreadings or not.

Even in his recent essay, Nagin presents one of the main motivations for using GBTM by first arguing that other methods are not capable of testing for the presence of groups, and then suggesting that GBTM can indeed solve this problem:

To test such taxonomical theories, researchers had commonly resorted to using assignment rules based on subjective categorization criteria to construct categories of developmental trajectories. While such assignment rules are generally reasonable, there are limitations and pitfalls attendant to their use. One is that the existence of distinct developmental trajectories must be assumed a priori. Thus, the analysis cannot test for their presence, a fundamental shortcoming. (…) The trajectories reported in Figure 2 provide an example of how GBTM models have been applied to empirically test predictions stemming from Moffitt’s (1993) taxonomic theory of antisocial behavior.
(My emphasis).

It might not say straight out whether the groups from GBTM are interpretable as real or not in this setting, nor what can be concluded from such “tests”. But given the previous debates and misconceptions, this is hardly a clarification.

My point is simply this: it has been claimed that GBTM can be used to test for the presence of distinct groups, and generally to test typological theories. (I have discussed this in more detail here and here). However, it is hard to see how such typological theories can be tested using GBTM. That is indeed very vaguely explained by the advocates of the methodology. I think (but I am not entirely sure), that in this context “testing a theory” only means findings that are consistent with a given theory. I think this is a generous use of the term “test”. I prefer to reserve the word “test” for situations where something is ruled out – or when using methods that at least in principle would be able to rule something out. In other words: If the findings are consistent with a theory but also consistent with one or several competing (or non-competing) theories, this is at best weak evidence for either theory. (This holds regardless of methods used). It is good that a theory is consistent with the empirical findings, but that is far from enough. I know of no published criminological study using GBTM that provides a test of typological theories in the stricter sense of the term. So far, it seems to me that the advocates of GBTM have not been clear on this issue. Some clarification would be in order.

The post Testing typological theories using GBTM? appeared on The Grumpy Criminologist 2016-07-01 12:00:36 by Torbjørn.

Rejections

I sometimes feel the need to get things out of my system. Sometimes I submit comments or research notes to journals, pointing out mistakes in published research, but I have started to realize that the editors are not really interested. There might be both good and bad reasons for this. I will avoid speculating, but one reason is probably my style of writing which it seems are perceived as harsh and grumpy, despite I am only trying to be clear and not wrap things in. (Not sure if this is a good or a bad reason for rejection, though).

In this blog, I will post some of this. However, I will not stick entirely to the theme of grumpiness (there is not enough material) and post on entirely unrelated themes.

The post Rejections appeared on The Grumpy Criminologist 2016-06-29 07:13:01 by Torbjørn.