Category: Statistical methods

Our paper on the paradox is out now

Our paper on the “Weisburd paradox” is now out in Journal of Quantitative Criminology. Mikko and I had initially put out own attempt in in the drawer since it turned out that Gelman had written a much better workingpaper on the same thing. It turned out that some additions were required for publication, and Gelman offered us to help out in the final rounds. We’re grateful for the opportunity. The story is here, here, and here.

Best practise of group-based modelling

I had initially decided to not pick more on group-based modelling, but here we go:

In his recent essay on group-based modelling in Journal of Research in Crime and Delinquency (see earlier posts here and here), Nagin discusses two examples of uses of group-based modelling in developmental criminology. It is not clear whether these are mentioned because they are particularly good examples, as they are presented as “early examples”. Maybe it is of historical interest, or these examples are mentioned because they are much cited. Since Nagin’s article is basically promoting GBTM, I assume these are mentioned because they are good examples of what can be achieved using this method. In any case, I would have liked to see examples where GBTM really made a difference.

The first example is the article by Nagin, Farrington and Moffitt who, according to Nagin using SPGM made an important contribution for the following reason:

…what was new from our analysis was the finding that low IQ also distinguished those following the low chronic trajectory from those following the adolescent limited and high chronic trajectories. This finding was made possible by the application of GBTM to identify the latent strata of offending trajectories present in the Cambridge data set.

The article by Nagin, Farrington and Moffitt shows that the relationship between IQ and delinquency varies over groups. (One could of course also say that the relationship is non-linear, but they stick to discussing groups). Which is fine, but the contribution is mainly how the groups are created – although some elaborate testing of equal parameters are involved. If other ways of summarizing the criminal careers (as either continuous or groups) would not find anything similar, it remains what is a methodological artefact and not. However, only one methodological approach was used, so it is hard to assess whether GBTM actually was the only way of discovering this kind of relationship. Maybe alternative methods (e.g. subjective classifications or a continuous index) would have found the exact same thing in these data? Could very well be.

The second example mentioned by Nagin is also written by himself together with Laub and Sampson. This is a very influential paper on the influence of marriage on crime, but has a major flaw because of how GBTM is used. I have recently written an review article together with Savolainen, Aase and Lyngstad on the marriage-crime literature published in Crime and Justice. We commented on this paper as follows:

…they estimated group-based trajectories (Nagin and Land 1993) for the entire observational period from age 7 to 32 and then assigned each person to a group on the basis of posterior probabilities. In the second
stage, they regressed the number of arrests in each 2-year interval from age 17 to 32 on changes in marital status and quality, controlling for group membership and other characteristics. In addition, they conducted separate regression analyses by trajectory group membership.
…
We are somewhat hesitant to endorse this conclusion for methodological reasons. Because the trajectories
were estimated over the entire period—including the marital period—controlling for group membership implies controlling for post-marriage offending outcomes as well. We expect this aspect of the analytic
strategy to bias the results, but further efforts are needed to assess the substantive implications of this methodological approach.

I think our final sentence of this quote very mild. They were partly conditioning on the outcome variable and that is bound to lead to trouble. Frankly, I do not know how to interpret these estimates. In this case, GBTM made a real difference, but to the worse. This was probably hard to see at the time since GBTM had not yet been subject to much methodological scrutiny yet. It is easier to see now.

In sum, although these two studies have other qualities, they are not examples of real success stories of GBTM. My advice would be to come up with some really good examples. But perhaps the only real success story of GBTM is Nagin and Lands 1993-article (for reasons given here).

P.S. Actually, I know of far better examples of using group-based modelling. Neither of them is dependent on GBTM, but it adds a nice touch. For example, Haviland et al use GBTM to improve propensity score matching. Another example is my own work with Jukka Savolainen where offending in the pre-job entry period is summarized using GBTM. For both these studies, other techniques could have been used, but GBTM works very well. There exist also other sound applications.

The post Best practise of group-based modelling appeared on The Grumpy Criminologist 2016-08-11 12:36:42 by Torbjørn.

About the Weisburd paradox

The “Weisburd paradox” refers to the finding by Weisburd, Petrosino and Mason who reviewed the literature of experimental studies in criminology and found that increasing the sample size did not lead to increased statistical power. While this paradox has perhaps not achieved great attention in the literature so far, the study was replicated last year by Nelson, Wooditch and Dario in Journal of Experimental Criminology confirming the phenomenon.

The empirical finding that larger sample size does not increase power is based on calculating “achieved power”. This is supposed to shed light on what the present study can and cannot achieve (see e.g. here). “Achieved power” is calculated in the same way as conventional power calculations, but instead of using the assumed effect size, one uses the estimated effect in the same study.

Statistical power refers to the probability of correctly rejecting the null hypothesis, based on assumptions about the size of the effect (usually based on previous studies or other substantive reasons). By increasing the sample size, the standard error gets smaller and this increases the probability of rejecting the null hypothesis if there is a true effect. Usually, power calculations are used to determine the necessary sample size as there is no point of carrying out a study if one cannot detect anything anyway. So, one needs to ensure sufficient statistical power when planning a study.

But using the estimated effect size in the power calculations gives a slightly different interpretation. “Achieved power” would be the probability of rejecting the null hypothesis, based on the assumption that the population effect is exactly equal to the observed sample effect. I would say this is rarely a quantity of interest since one has already either rejected or kept the null hypothesis… Without any reference to external information about true effect sizes, post-hoc power calculations brings nothing new to the table beyond what the point estimate and standard error already provides.

Larger “achieved power” imply larger estimated effect size, so let’s talk about that. The Weisburd paradox is that smaller studies tend to have larger estimated effects than larger studies. While Nelson et al discuss several reasons for why that might be, they did not put much weight on what I would consider the prime suspect: a lot of noise combined with the “significance filter” to get published. If there is a significant effect in a small study, the point estimate needs to be large. If significant findings are easier to publish, then the published findings from small studies would be larger on average. (In addition, researchers have incentives to find significant effects to get published and might get tempted to do a bit of p-hacking – which makes things worse). So, the Weisburd paradox might be explained by exaggerated effect sizes.

But why care? First, I believe the danger is that such reasoning might mislead researchers to justify conducting too small studies, ending up chasing noise rather than making scientific progress. Second, researchers might give the impression that their findings are more reliable than it really is by showing that they have high post-hoc statistical power.

Just to be clear: I do not mind small studies as such, but I would like to see the findings from small studies replicated a few times before giving them much weight.

Mikko Aaltonen and I wrote a commentary to the paper by Nelson et al. and submitted it to Journal of Experimental Criminology, pointing out such problems and argued that the Weisburd paradox is not even a paradox. We were rejected. There are both good and bad reasons for this. One of the reviewers pointed out a number of points to be improved and corrected. The second reviewer was even grumpier than me and did not ~~want to~~ understand our points at all. When re-reading our commentary, I can see much to be improved and I also see that we might be perceived as more confrontational than intended. (I also noticed a couple of other minor errors). Maybe we should have put more work into it. You can read our manuscipt here (no corrections made). We decided not to re-write our commentary to a more general audience, so it will not appear elsewhere.

When writing this post, I did an internet search and found this paper by Andrew Gelman prepared for the Journal of Quantitative Criminology. His commentary on the Weisburd paradox is clearly much better written than ours and more interesting for a broader audience. Less grumpy as well, but many similar substantive points. I guess Gelman’s commentary should pretty much settle this issue. Kudos to Gelman. EDIT: ~~, but also to JQC for publishing it.~~ An updated version of Gelman’s piece is here – apparently not(!) accepted for publication yet.

The post About the Weisburd paradox appeared on The Grumpy Criminologist 2016-07-14 10:00:39 by Torbjørn.

Criminological progress!

I recently came across this article by David Greenberg in the Journal of Developmental and Life Course Criminology. I have previously seen an early draft, and I am glad to see it finally published! (Should have been published a long time ago as the version I saw was pretty good, but I have no idea why it has not). Greenberg shows how to use standard multilevel modeling with normal distributed parameters to test typological theories. The procedure is actually not very complicated: estimate a random effects model, use empirical Bayes to get point estimates for each person’s intercept and slope(s), and explore the distributions of those point estimates using e.g. histograms. And no: those empirical Bayes estimates do not have to be normal distributed! You need to decide for yourself (preferably up front) what it takes for these distributions to be in support of your favourite typology, so it requires a bit of thinking. This can all be done in standard statistical software, only requiring knowing a little bit about what you’re doing. It would be really nice to see previous publications using group-based models reanalyzed in this way.

The article also discuss a number of related modeling choices which are highly informative. So far, I have only read the published version of the article very quickly, and I need to read it more carefully before I fully embrace all arguments, but I might very well end up embracing it all.

I have noticed that it has been claimed in the literature that models assuming normal distributed random effects cannot test for the existence of subpopulations. Well, it is the other way around.

The post Criminological progress! appeared on The Grumpy Criminologist 2016-07-04 12:00:12 by Torbjørn.

Testing typological theories using GBTM?

As I mentioned in the post yesterday, I think the debates about group-based trajectory modeling have some unresolved issues. For this reason, I submitted a commentary to Journal of Research in Crime and Delinquency. I had two reasons for doing so. First, I think Nagin mischaracterized his critics, and I believe his essay was a willful attempt to avoid serious criticism by ignoring serious arguments. (Maybe I could have been less outspoken about that). But after all, he has not addressed the actual argument I (and others) have put forward. I can only interpret this as an attempt to avoid discussing the substantive matter by keeping silent, and now subtly dismissing the whole thing. If Nagin find it worthwhile saying his critics have misunderstood, he should also bother to point out how. So far, he has done no such thing.

Second, I actually think there is a need to clarify whether GBTM can test for the presence of groups or not. If the advocates of GBTM had been clear about this, it would obviously not have been needed. There is no doubt that Nagin and others have been clear that GBTM can – or maybe even should – be interpreted as an approximation to a continuous distribution. There is no disagreement on that point. But they have also given the impression that one can identify meaningful real groups in the data by way of GBTM. They have not been clear on what this really means or under what conditions this can be done. A clarification is in order, since it is clear in the literature that findings from GBTM analyses have been interpreted as giving very strong evidence to a certain typological theory (see e.g. here). I have claimed this empirical evidence is weak and largely based on overinterpretation of empirical studies using GBTM (see, here and here). It would be helpful if Nagin could clarify the strength of this evidence.

So I wrote a commentary and submitted it to The Journal of Research in Crime and Delinquency. (See the full commentary here). According to the letter from the editor, it was rejected because:

Language at the top of page 2 in your comment underscores a fundamental misunderstanding and misreading of Nagin’s work.
(See the full rejection letter here).

Well, maybe I should have put things more politely, but I still believe my arguments are right. I can understand that there might be good editorial reason for why not having another debate about GBTM in the journal, but I am not impressed with the reason given. My fundamental misunderstanding is revealed (on the top of page 2) where I point out that Nagin himself is responsible for some of the confusion regarding the interpretation of the groups. I do so with clear references, so you can decide for yourself whether these are misreadings or not.

Even in his recent essay, Nagin presents one of the main motivations for using GBTM by first arguing that other methods are not capable of testing for the presence of groups, and then suggesting that GBTM can indeed solve this problem:

To test such taxonomical theories, researchers had commonly resorted to using assignment rules based on subjective categorization criteria to construct categories of developmental trajectories. While such assignment rules are generally reasonable, there are limitations and pitfalls attendant to their use. One is that the existence of distinct developmental trajectories must be assumed a priori. Thus, the analysis cannot test for their presence, a fundamental shortcoming. (…) The trajectories reported in Figure 2 provide an example of how GBTM models have been applied to empirically test predictions stemming from Moffitt’s (1993) taxonomic theory of antisocial behavior.
(My emphasis).

It might not say straight out whether the groups from GBTM are interpretable as real or not in this setting, nor what can be concluded from such “tests”. But given the previous debates and misconceptions, this is hardly a clarification.

My point is simply this: it has been claimed that GBTM can be used to test for the presence of distinct groups, and generally to test typological theories. (I have discussed this in more detail here and here). However, it is hard to see how such typological theories can be tested using GBTM. That is indeed very vaguely explained by the advocates of the methodology. I think (but I am not entirely sure), that in this context “testing a theory” only means findings that are consistent with a given theory. I think this is a generous use of the term “test”. I prefer to reserve the word “test” for situations where something is ruled out – or when using methods that at least in principle would be able to rule something out. In other words: If the findings are consistent with a theory but also consistent with one or several competing (or non-competing) theories, this is at best weak evidence for either theory. (This holds regardless of methods used). It is good that a theory is consistent with the empirical findings, but that is far from enough. I know of no published criminological study using GBTM that provides a test of typological theories in the stricter sense of the term. So far, it seems to me that the advocates of GBTM have not been clear on this issue. Some clarification would be in order.

The post Testing typological theories using GBTM? appeared on The Grumpy Criminologist 2016-07-01 12:00:36 by Torbjørn.

An update on group-based trajectory modeling in criminology

In a special issue of Journal of Research in Crime and Delinquency on criminal career research, Daniel Nagin wrote an essay about the contribution of group-based trajectory modeling (GBTM). Appropriately, he also refers to the controversies about the applications of this methodology, where he contends that all earlier critique is just based on a couple of misunderstandings. I am of course honored that my own critique is found important enough to be mentioned (although as one of those having misunderstood the point). I suppose that means I have made some sort of impact. It would have been nice, though, if my actual arguments had been met with ~~rational~~ clear arguments instead of just being dismissed. In fact, other advocates of GBTM have actually responded to my work, but without pointing out any mistakes on my part.

It is worth pointing out that Nagin also refers to another important critique of GBTM by Daniel J. Bauer, who it seems, according to Nagin is based on the same misconceptions. (Other critics could have been mentioned as well). To my knowledge, none has pointed out mistakes in the arguments made by Bauer. On the contrary, Nagin and Odgers (p. 118) have previously acknowledged the importance of a simulation study by Bauer and Curran:

Their work serves as a useful a caution against the quixotic quest to identify the true number of groups in either GMM or GBTM analyses. Perhaps most importantly, this work reinforces the need to move away from interpretations of trajectory groups as literally distinct entities.

So, Nagin has previously agreed that the groups have been interpreted as distinct entities, at least by some, and we should move away from such interpretations. Yet, reading his recent essay, one gets the impression that those criticizing such interpretations have just misunderstood the point. This seems like a contradiction to me.

I do not mind the disagreement, but it would have moved the academic debate forward if those accusing others of being misguided could meet the actual arguments or point out errors in the premises etc. I am still waiting for someone to point out the mistakes in my article on GBTM.

P.S. I have never seen any of the advocates for GBTM criticizing any interpretation of GBTM.

P.P.S Actually, Brame et al did point out a mistake of mine, of which I agree. I had written that Moffitt’s taxonomy was “spurred” by GBTM. That was clearly the wrong word, and I can only blame my bad English as a non-native speaker. I should have written that the popularity of the taxonomic approach was “fueled” by the development of GBTM. Not a major point, though.

The post An update on group-based trajectory modeling in criminology appeared on The Grumpy Criminologist 2016-06-30 09:06:25 by Torbjørn.