My input to the ASC publication committee

The ASC Publication Committee asked for input on policy and process for publication complaints. The background is of course the now retracted papers in Criminology for reasons detailed by Justin Pickett, and the statements published by the ASC, as well as the video from the forum on scientific integrity. I have previously commented upon it here and here.

I submitted the following to the ASC Publication Committee:

Dear ASC publications committee,
First of all, and I am glad to see the ASC taking steps to improve procedures, and I appreciate you giving everyone the opportunity to give input.
 
One important issue in the recent debates is access to data and the reproducibility of the results. To re-analyse the original data is clearly crucial when there are allegations of research misconduct. At the more general level, when there are such difficulties, then it also becomes clear that the data used in the publications have not sufficiently well documented in the first place. I think this needs to improve.
 
There are now strong moves towards what is often referred to as “open science”. Obviously, if data were made publicly available in the first place, it is much easier to check the results by others. However, while making data openly available to all is in many respects desirable, it is also very often not possible with the kinds of sensitive data criminologists typically use. But many of the ethos of “open science” are general principles of science, and some minimum measures should be taken even without committing to any specific “open” framework. At the core is the documentation of research procedures, including data collection and data management. The focus should be on such documentation, and I would like to see some minimum standards of such reporting to be implemented for all studies.
 
Others have probably provided more thorough suggestions, but I think the following could be a good starting point. My suggestions are simple and should not require much additional effort by anyone (neither authors or editors). I suggest that all published quantitative studies should include the following information:
a)       Regardless of data sources, there should be a note detailing how others can get access to the same data. If special permissions needs to be obtained, information on where to apply must be provided as well as the main conditions for access. If data cannot be made available to others, then the reason for this must be stated. If data will be made available to others at some later point in time, then information on when and how should be included.
b)      When and who collected the data. If a survey company have been hired, there should be some reference to contract or other documentation.
c)       If data have been obtained from an existing study (e.g. AddHealth or NLYS) there should be a reference to when and how the data were handed over, including specifications of sub-samples (when relevant). Thus, others should be able to get access to the exact same data.
d)      If data have been obtained from administrative records, there should be references to who handed over the data, including dates and permissions etc.
e)      Most studies require ethics approvals. Reference to such approvals should always be provided.
f)        Reproducible code should be made available for all studies regardless of data availability. This code should at least cover the estimation procedures, but preferably also the entire workflow from raw data to end results. Whether code is stored as supplementary files at the journal or some repository is of no importance as long as it is specified.
 
These suggestions are primarily relevant for quantitative studies, but some would apply to qualitative studies as well. One should also create similar guidelines appropriate for qualitative studies.
 
Please recognize that I expect all researchers to be able to provide this information with minimum effort. It is simply providing basic documentation. Indeed, if researchers cannot do so, then journals such as Criminology should not publish the article at all simply because the study is not well documented. I consider this to be a minimum requirement.
 
I would also like to see journals to make conditional acceptance of articles based pre-registration, but that would require a bit more work on the principles. I consider also pre-registration as a kind of documentation of ideas and plans. I do not think it should be mandatory, only encouraged.
 
I think Criminology would benefit from this in at least two major ways: 1) Increase the quality of the work published, just by making the studies more reproducible and well documented. 2) Increase the status of the journal, and gaining international reputation for being at the forefront in this development. 3) Increase trust in the results published in the journal.

I should probably have added that if there are complaints regarding errors in the above documentation (which cannot be fixed within a couple of weeks or so), retraction should be considered based on that alone.

I could have referred to e.g. the Open Science Framework (which is great), and others have probably written more thoroughly on such issues. But I think such documentation is so basic that it is embarrassing it is not already standard requirements.

Taking raised concerns seriously – but do not regret other statements?

Yesterday, the American Society of Criminology posted two statements regarding the Chronicle of Higher Education article of September 24. The first is by the ASC executive committee, stating support for how the editorial team is handling the matter, and ensuring that the process follows the COPE framework. This is very good. Even though Criminology is not a member of COPE, their guidelines are very sensible, and similar to Wiley’s guidelines. COPE has a flow chart that describe the process.

The second statement is from the co-editors of Criminology. This statement explains how the journal handles cases where there is raised concerns about an article. The main approach is a comment-and-reply model, where critics submit their comment to the journal and the original author is offered to reply. They also state that this is not appropriate in all instances, and additional steps may be necessary, including retractions if the evidence is strong. This is all fine, and I agree.

The comment from the co-authors also details the time line from when they got an anonymous email on the May 29, 2019 and up to today. The also emphasize that they did issue a statement July 26 notifying that investigations were being done.

This is all good. I expect nothing less.

However, the statements are not really a comment directly on the article in the Chronicle, although on the same topic. I guess their main message is just to ensure that they are pursuing the case, as is clear from the following statement:

“Social media attention to Dr. Pickett’s online statement led to what we perceive as a rush to judgment against the authors and the journal, including the mischaracterization that we are not taking the issue seriously and are not committed to resolving it.  Nothing could be further from the truth. We have taken several steps aimed at obtaining a fair and transparent resolution.”

From my point of view, the editorial statement July 26 was fine, and I trusted the journal to do an appropriate investigation as stated. I did think it now took a bit long time, but I have no problem accepting that there might be good reasons for that.

I was alarmed and disappointed only when I read the article in The Chronicle. There were stories, speculations and rumors that are not the responsibility of the journal, whether true or false. Criminology is not to blame for any of that. However, the chief editor, David McDowall, was quoted in the article saying things that gave the impression that Criminology did not carry out an appropriate investigation. I believe it was precisely his statements that made people doubt whether Criminology took the issue seriously. I think there are three main points:

First, the chief editor was quoted questioning Pickett’s personal motives. It seemed like McDowall actively defended Stewart, and tried to make Pickett look bad. Given that the journal’s investigation is not ready, it is highly inappropriate for the chief editor to make such statements.  

Second, the chief editor was quoted on claiming that the journal has published “complete gibberish” before, referring to one specific instance. He even seems to be fine with that as it appeared to be an argument against retracting the article in question. Let’s just hope he was misquoted.

Third, the chief editor was portrayed as “no fan of the move toward more scrutiny in the social sciences, which he sees as overly aggressive”. That was not a direct quote, but there is a direct quote where he refers to such scrutiny having a “blood-sport aspect to it” (which obviously does not sound positive). Scrutiny should be at the heart of social science, and so should reproducibility and accountability. While I do expect journals to handle such instances in a professional manner (no blood-sport), it is hard to accept that the chief editor is not in favor of such scrutiny.

My point here is that the statement from the co-editors do not clarify these three conserns following from the quotes in the Chronicle. It would be good to know if the chief editor was misquoted or cited out of context. Or maybe he was just sloppy and did not really mean those things, or even regretted that it came out that way. Whatever. Does he and the journal stand by these things or not? I would have hoped that the statement from the co-editors would 1) apologize for prematurely questioning Pickett’s motives in public and hopefully also state that it was not the intention at all, 2) ensure that Criminology do not accept publishing “complete gibberish”, but will now look into also the other article mentioned by the chief editor to check if that was actually the case, and 3) ensure that Criminology supports the move to increased scrutiny in the social sciences.

In any case, the co-editors have been very clear that they are taking the issue seriously, and the ASC executive committee ensures the process will follow the COPE guidelines. I trust that is happening.

Clearly, there are ways of improving research integrity and accountability without any aspects of blood-sport. Some improvements might even be easy. I might come back to that in a later post.

UPDATE: The chief editor just sent an email to all ASC members where he clarifies that some of the words he used were regrettable and do not reflect what he really means neither about editorial policy nor about persons involved. That is good! It goes a long way answering my concerns in this blog post.

The former flagship journal Criminology

I’m so incredible disappointed in the journal Criminology. It is meant to be the flagship journal in our field, but it is clearly not up to the task these days.

The journal is published by Wiley, so lets start reviewing the publishing house’s general policy on retractions here: https://authorservices.wiley.com/ethics-guidelines/retractions-and-expressions-of-concern.html. Just take a look at the first point:

“Wiley is committed to playing its part in maintaining the integrity of the scholarly record, therefore on occasion, it is necessary to retract articles. Articles may be retracted if:

– There is major scientific error which would invalidate the conclusions of the article, for example where there is clear evidence that findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error).”

What I know about the story has been in the public for a while. In July, Justin Pickett posed this paper on SocArXiv here: https://osf.io/preprints/socarxiv/9b2k3/ , explaining that an earlier paper has fundamental errors. Surprisingly, a survey of 500, but the article reports n = 1,184. While I can understand errors can lead to duplicates, I do not understand that it can happen without noticing. Pickett details numerous other errors, and asks for the article to be retracted. That seems like a perfectly reasonable request, and I fail to see how it could be declined. But it has.

A story in The Chronicle Review (behind paywall, but is also available here) reveals astonishing statements from the chief editor, David McDowall, who even says he has not read Picketts letter thoroughly. Any editor receiving such a letter should be highly alarmed and should indeed consider all details very carefully. Apparently, the editorial team does little or nothing. Or at least: fail to communicate that they are doing anything.

I find the following quote particularly disturbing:

First, McDowall seems to think a correction of errors has the goal of ruining other people’s career. I have to say that Pickett’s letter seems to me to be sober and to the point. Pickett gave his co-author more than fair chance to make the corrections himself before publishing his note. It seems like a last resort, not a blood sport at all. If the authors had just admitted the errors and agreed to retract, it would have been a regrettable mistake, but now it is a scandal.

Second, a flagship journal should never publish “complete gibberish”! That some (or even many) articles turned out to be wrong, fail to replicate and contains errors is not that surprising (although not desirable, of course), but “complete gibberish” should not occur. If it nevertheless happens, those articles should be retracted.

The unwillingness of the journal’s chief editor to take this matter seriously reveals a serious lack of concern with the truth. That should be unacceptable to Wiley as well as the American Society of Criminology.

I am just so very, very disappointed.

P.S. I do not have any solutions to the systemic problems here, but improvements should be easy. Criminology as a field has to improve in terms of making data available with full documentation and reproducible code. That would make errors detectable sooner.

A comment on Laub et al on marriage effects on crime

The just published Oxford Handbook of Developmental and Life Course Criminology includes an article where John Laub, Zachary Rowan and Robert Sampson gives an update on the age-graded theory of informal social control, which has dominated the field of life course criminology for the past couple of decades.

A key proposition of the theory is that life-course transitions can represent turning points in a criminal career, and marriage is the transition that has received the most attention in the empirical literature. A few years ago, I wrote a critical review together with my colleagues, Jukka Savolainen, Kjersti Aase and Torkild Lyngstad. In their book chapter, Laub et al are clearly critical of our review. I am a bit flattered that they bothered criticising us, but I do have a few comments.

First, Laub et al. correctly point out that we are unsatisfied with the existing studies considering estimating causal effects, and they do not really contradict our claim. Nevertheless, they point out that we “do not offer a viable alternative to build on this existing knowledge” (p 302). That might be right, but I do not think that is our responsibility either. I think those who advocate for a theory also has the responsibility for providing convincing empirical evidence.

Second, I think we actually did suggest a viable alternative. Importantly, we doubt that a causal effect of marriage on crime can be estimated at all, since it is hard to see how there might be any plausible exogenous variation. (I do not rule that out completely, but I am still waiting to see such a study). Instead, we suggest checking the preconditions for the theory to be true. For example, one suggested mechanism is that the spouse oppose criminal behaviour and exercises social control. If so, a survey of spouses’ attitudes to offending and how they react to their husband’s behaviour would provide relevant empirical evidence to the extent the premises for the theory are true. Providing any such evidence would make the theory more plausible. (If the spouses are favourable to crime and/or do not excertise any meaningful control over their husband, then that mechanism is not supported. Otherwise, it is corroborated). So, a viable alternative would be to check more carefully the preconditions empirically. It would still not provide an estimate of a causal effect, that is true, but it might be the best we can do.

Third, Laub, Zachry and Sampson states that “to rule out evidence that only comes from non-randomized experiments is to rule out most of criminology” (p 302). Now, that does not quite follow from our argument. Estimates of causal effects can only be provided if there is some kind of exogenous variation to be exploited. A causal theory can be corroborated in other ways, but it is not easy either. A careful descriptive study might provide evidence that are inconsistent with competing theories. Empirical findings that are equally consistent with a selection effect (or other competing theories), does not really count as a test of the theory.

Fourth, they refer to my joint work with Jukka Savolainen where we show that change occurs prior to employment rather than as a response to it, which Torkild Lyngstad and I also showed regarding the transition to marriage. Laub et al point out that Laub and Sampson (2003) acknowledge that ‘turning points’ are a part of a gradual process, and that turning points “are not a constant once set in motion, and they vary through time” (p 307). While this might sound reasonable, it also makes it a bit hard to understand what a turning point is. If changes in offending before marriage (or work) are consistent with the theory, then I am not sure it is possible to say when a turning point occurs. That makes it harder to empirically test the theory.

Fifth, Laub et al hint that since almost only studies using Norwegian register data shows the pattern of decline prior to a turning point, it might be something particular with the Norwegian setting. We actually suggested that the family formation patterns in the Nordic countries differ from the US in our review of research (see our article, page 438). While the context might indeed be important, that is not the main reason why so few other studies have found the same pattern. Actually, we argue that our findings are consistent with previously published results. Earlier studies should be repeated using an approach similar to what we did: just check the timing of change in offending relative to the transition in question. Until that is done, there is no basis for claiming the Norwegian patterns are any different from other contexts. (They might be, but we do not know yet).

Sixth, Laub et al discuss the role of cohabitation, and make a similar argument as we did in our review article: that the role of cohabitation is often a ‘trial-period’ or a ‘stepping stone’ towards marriage, and if it works out they will often marry. But Laub et al’s discussion of evidence focuses on whether the marriage effect translates into cohabiting couples, which is a discussion that does not take into account the point that marriage is an increasingly selective state, as well as it is becoming increasingly difficult to say when we should expect to see changes in offending.

In sum, I do apprechiate Laub et al. making an effort discussing specific arguments in our work. However, I am not quite convinced. I actually tend to think a romantic partner, a good job and generally changes in life situation might have an effect on crime. I find that reasonable, and I hope it is true. I am nevertheless not quite convinced by the empirical evidence, and I am hesitant to make claims about ‘turning points’. However, I do believe the empirical evidence can be improved through: 1) Check the timing of change, and 2) empirically investigate the specific preconditions for the mechanisms at work.

Moffitt review her own theory

Two days ago, Nature published a review article by Terrie Moffitt that “recaps the 25-year history of the developmental taxonomy of antisocial behaviour, concluding that it is standing the test of time…”

It should also be mentioned that the taxonomy has received quite a bit of criticism (which are not mentioned in Moffitt’s review), and I feel that also much of this critique is standing the test of time. It would have been a good thing if the 25th anniversary of the taxonomy took the time to clear up some misunderstandings, controversies, and make some clarifications. However, Moffitt refrains from doing so, and I am not so sure the debate has moved forward. I have made some contributions to this debate, and I think my points are as relevant as ever. See here and here. It feels a bit wrong that they too stand the test of time. Importantly, so does the critique made by others.

In her recent review, she repeats what she also claimed in her 2006-review of evidence: a very large and important part of the empirical evidence supporting her theory is from studies using this latent trajectory models which. A key piece of evidence seems to be the identification of the hypothesized groups, as she states: “Since the advent of group-based trajectory modelling methods, the existence of trajectory groups fitting the LCP and AL taxonomy has now been confirmed by reviews of more than 100 longitudinal studies”. The method is a particular kind of latent class model for panel data. I would say this evidence is pretty weak. First of all, my discussion of trajectory models makes it clear that seemingly distinct groups can be detected in data where there are none. Since the further test of hypotheses relies on the identification of groups, these hypotheses are not reliable evidence either. The empirical evidence for the taxonomy is thus equally consistent with competing theories, and thus at best very weak evidence for either. Others have made similar points as well. 

In her new article on page 4 she makes the claim group-based trajectory methods are capable of detecting hypothesized groups. The method does no such thing. It is a data reduction technique, which might be convenient for some purposes but it does not detect distinct groups. It creates some clusters, but it could equally well reflect an underlying continuous reality. Moreover, that the existence of these groups is confirmed across studies is so only if one accepts pretty much any evidence of heterogeneity in individual trajectories. As I pointed out in an article from 2009, the findings across studies are so divergent except that there is some kind of high-rate and low-rate groups, that it is hard to imagine any results from trajectory modelling that would not be taken in support of the taxonomy.

In short: At the best, the empirical evidence is consistent with the taxonomy. But this is largely uninformative as long as it is also consistent with pretty much all competing theories that acknowledge that different people behaves differently. The bottom line is that there are no evidence that there are qualitative differences between the “groups” (at least no such evidence are presented in Moffitt’s recent review). There might be quantitative differences, though.

The other risk factors she discusses and its relation to the groups could just as well be interpreted as differences in degree. However, on page 5, she dismisses that there might be quantitative rather than qualitative differences! (This is the closest to a clarification of whether she actually means literally distinct groups or not). Now, the evidence I have seen so far, shows that there are indeed differences between the average scores in the two groups, but most theories of criminal behaviour would expect higher scores on all risk factors for the highest-offending persons. While it sounds great that she proposed hypothesis in her 1993-article that have later proved correct – these hypothesis are also very general and consistent with other perspectives.

The key point here is that the empirical evidence is consistent with the taxonomy – and pretty much all other theories. It seems that the theory has not been put to a strict test in these 25 years. In a previous post, I made the following argument which holds generally:

I think (but I am not entirely sure), that in this context “testing a theory” only means findings that are consistent with a given theory. I think this is a generous use of the term “test”. I prefer to reserve the word “test” for situations where something is ruled out – or when using methods that at least in principle would be able to rule something out. In other words: If the findings are consistent with a theory but also consistent with one or several competing (or non-competing) theories, this is at best weak evidence for either theory. (This holds regardless of methods used). It is good that a theory is consistent with the empirical findings, but that is far from enough.

Second, in 2009 I wrote a more theoretical paper assessing the arguments in the taxonomic theory. A major point was that no argument are presented that there are distinct groups in the first place. However, one might argue that I have interpreted the theory too literally regarding the distinctness etc, so in this article, I also make an explicit discussion of this possibility. In 2009, I argued that since there are clearly some confusion regarding this issue, it would have been reasonable if someone (preferably Moffitt, of course) clarified if she really meant distinct groups or not. I am not aware any such clarification to date. But, as mentioned, she now goes a long way on dismissing the differences in degree interpretation (see her new article on page 5). I think the argument made by Sampson and Laub still holds: if LCP is just another term for high-rate, then the theory brings nothing new to the table. Indeed, all the mechanisms and risk-factors discussed are relevant and sound, but does not at all rely on a taxonomy as such.

In my view, the review should have concluded something like this: First, while much empirical evidence is consistent with the taxonomy, there is a lack of good evidence for the existence of groups. Second, there are still theoretical arguments that are unclear and needs specifications to allow for strict empirical tests. Nevertheless, the taxonomy has helped focusing on some important risk factors and mechanisms. (Although these factors were also known in 1993 according to Moffitt). Whether the taxonomy itself is needed to do so is less clear. Important work remains to be done.

What I am saying in some elaborate way is that the standards for what counts as empirical evidence in support of a hypothesis is too low. So is the precision level for “theories”. I know it is hard, but we should be able to do better.

 

 

PS Moffitt also refers to one of my articles on her first page when stating that “Chronic offenders are a subset of the 30–40% of males convicted of non-traffic crimes in developed nations”. My article says nothing of the kind, but tries to estimate how many will be convicted during their lifetime. It is just the wrong reference, but would of course recommend reading it 🙂

PPS I take the opportunity to also point out that while Nagin has previously claimed that my critique is simply based on a fundamental misunderstanding of his argument (see my comment on Nagin here), I have always argued, regardless of his position, that my methodological arguments are important because of how others – like Moffitt  has just demonstrated – misunderstand the methods and the empirical results. Nagin also has a responsibility to clarify such prevalent misunderstandings.

 

The post Moffitt review her own theory appeared on The Grumpy Criminologist 2018-02-26 20:47:05 by Torbjørn.

Our paper on the paradox is out now

Our paper on the “Weisburd paradox” is now out in Journal of Quantitative Criminology. Mikko and I had initially put out own attempt in in the drawer since it turned out that Gelman had written a much better workingpaper on the same thing. It turned out that some additions were required for publication, and Gelman offered us to help out in the final rounds. We’re grateful for the opportunity. The story is here, here, and here.

 

Best practise of group-based modelling

I had initially decided to not pick more on group-based modelling, but here we go:

In his recent essay on group-based modelling in Journal of Research in Crime and Delinquency (see earlier posts here and here), Nagin discusses two examples of uses of group-based modelling in developmental criminology. It is not clear whether these are mentioned because they are particularly good examples, as they are presented as “early examples”. Maybe it is of historical interest, or these examples are mentioned because they are much cited. Since Nagin’s article is basically promoting GBTM, I assume these are mentioned because they are good examples of what can be achieved using this method. In any case, I would have liked to see examples where GBTM really made a difference.

The first example is the article by Nagin, Farrington and Moffitt who, according to Nagin using SPGM made an important contribution for the following reason:

…what was new from our analysis was the finding that low IQ also distinguished those following the low chronic trajectory from those following the adolescent limited and high chronic trajectories. This finding was made possible by the application of GBTM to identify the latent strata of offending trajectories present in the Cambridge data set.

The article by Nagin, Farrington and Moffitt shows that the relationship between IQ and delinquency varies over groups. (One could of course also say that the relationship is non-linear, but they stick to discussing groups). Which is fine, but the contribution is mainly how the groups are created – although some elaborate testing of equal parameters are involved. If other ways of summarizing the criminal careers (as either continuous or groups) would not find anything similar, it remains what is a methodological artefact and not. However, only one methodological approach was used, so it is hard to assess whether GBTM actually was the only way of discovering this kind of relationship. Maybe alternative methods (e.g. subjective classifications or a continuous index) would have found the exact same thing in these data? Could very well be.

The second example mentioned by Nagin is also written by himself together with Laub and Sampson. This is a very influential paper on the influence of marriage on crime, but has a major flaw because of how GBTM is used. I have recently written an review article together with Savolainen, Aase and Lyngstad on the marriage-crime literature published in Crime and Justice. We commented on this paper as follows:

…they estimated group-based trajectories (Nagin and Land 1993) for the entire observational period from age 7 to 32 and then assigned each person to a group on the basis of posterior probabilities. In the second
stage, they regressed the number of arrests in each 2-year interval from age 17 to 32 on changes in marital status and quality, controlling for group membership and other characteristics. In addition, they conducted separate regression analyses by trajectory group membership.

We are somewhat hesitant to endorse this conclusion for methodological reasons. Because the trajectories
were estimated over the entire period—including the marital period—controlling for group membership implies controlling for post-marriage offending outcomes as well. We expect this aspect of the analytic
strategy to bias the results, but further efforts are needed to assess the substantive implications of this methodological approach.

I think our final sentence of this quote very mild. They were partly conditioning on the outcome variable and that is bound to lead to trouble. Frankly, I do not know how to interpret these estimates. In this case, GBTM made a real difference, but to the worse. This was probably hard to see at the time since GBTM had not yet been subject to much methodological scrutiny yet. It is easier to see now.

In sum, although these two studies have other qualities, they are not examples of real success stories of GBTM. My advice would be to come up with some really good examples. But perhaps the only real success story of GBTM is Nagin and Lands 1993-article (for reasons given here).

 

P.S. Actually, I know of far better examples of using group-based modelling. Neither of them is dependent on GBTM, but it adds a nice touch. For example, Haviland et al use GBTM to improve propensity score matching. Another example is my own work with Jukka Savolainen where offending in the pre-job entry period is summarized using GBTM. For both these studies, other techniques could have been used, but GBTM works very well. There exist also other sound applications.

The post Best practise of group-based modelling appeared on The Grumpy Criminologist 2016-08-11 12:36:42 by Torbjørn.

About the Weisburd paradox

The “Weisburd paradox” refers to the finding by Weisburd, Petrosino and Mason who reviewed the literature of experimental studies in criminology and found that increasing the sample size did not lead to increased statistical power. While this paradox has perhaps not achieved great attention in the literature so far, the study was replicated last year by Nelson, Wooditch and Dario in Journal of Experimental Criminology confirming the phenomenon.
The empirical finding that larger sample size does not increase power is based on calculating “achieved power”. This is supposed to shed light on what the present study can and cannot achieve (see e.g. here). “Achieved power” is calculated in the same way as conventional power calculations, but instead of using the assumed effect size, one uses the estimated effect in the same study.
Statistical power refers to the probability of correctly rejecting the null hypothesis, based on assumptions about the size of the effect (usually based on previous studies or other substantive reasons). By increasing the sample size, the standard error gets smaller and this increases the probability of rejecting the null hypothesis if there is a true effect. Usually, power calculations are used to determine the necessary sample size as there is no point of carrying out a study if one cannot detect anything anyway. So, one needs to ensure sufficient statistical power when planning a study.
But using the estimated effect size in the power calculations gives a slightly different interpretation. “Achieved power” would be the probability of rejecting the null hypothesis, based on the assumption that the population effect is exactly equal to the observed sample effect. I would say this is rarely a quantity of interest since one has already either rejected or kept the null hypothesis… Without any reference to external information about true effect sizes, post-hoc power calculations brings nothing new to the table beyond what the point estimate and standard error already provides.
Larger “achieved power” imply larger estimated effect size, so let’s talk about that. The Weisburd paradox is that smaller studies tend to have larger estimated effects than larger studies. While Nelson et al discuss several reasons for why that might be, they did not put much weight on what I would consider the prime suspect: a lot of noise combined with the “significance filter” to get published. If there is a significant effect in a small study, the point estimate needs to be large. If significant findings are easier to publish, then the published findings from small studies would be larger on average. (In addition, researchers have incentives to find significant effects to get published and might get tempted to do a bit of p-hacking – which makes things worse). So, the Weisburd paradox might be explained by exaggerated effect sizes.
But why care? First, I believe the danger is that such reasoning might mislead researchers to justify conducting too small studies, ending up chasing noise rather than making scientific progress. Second, researchers might give the impression that their findings are more reliable than it really is by showing that they have high post-hoc statistical power.
Just to be clear: I do not mind small studies as such, but I would like to see the findings from small studies replicated a few times before giving them much weight.
Mikko Aaltonen and I wrote a commentary to the paper by Nelson et al. and submitted it to Journal of Experimental Criminology, pointing out such problems and argued that the Weisburd paradox is not even a paradox. We were rejected. There are both good and bad reasons for this. One of the reviewers pointed out a number of points to be improved and corrected. The second reviewer was even grumpier than me and did not want to understand our points at all. When re-reading our commentary, I can see much to be improved and I also see that we might be perceived as more confrontational than intended. (I also noticed a couple of other minor errors). Maybe we should have put more work into it. You can read our manuscipt here (no corrections made). We decided not to re-write our commentary to a more general audience, so it will not appear elsewhere.
When writing this post, I did an internet search and found this paper by Andrew Gelman prepared for the Journal of Quantitative Criminology. His commentary on the Weisburd paradox is clearly much better written than ours and more interesting for a broader audience. Less grumpy as well, but many similar substantive points. I guess Gelman’s commentary should pretty much settle this issue. Kudos to Gelman. EDIT: , but also to JQC for publishing it. An updated version of Gelman’s piece is here – apparently not(!) accepted for publication yet.

The post About the Weisburd paradox appeared on The Grumpy Criminologist 2016-07-14 10:00:39 by Torbjørn.

Criminological progress!

I recently came across this article by David Greenberg in the Journal of Developmental and Life Course Criminology. I have previously seen an early draft, and I am glad to see it finally published! (Should have been published a long time ago as the version I saw was pretty good, but I have no idea why it has not). Greenberg shows how to use standard multilevel modeling with normal distributed parameters to test typological theories. The procedure is actually not very complicated: estimate a random effects model, use empirical Bayes to get point estimates for each person’s intercept and slope(s), and explore the distributions of those point estimates using e.g. histograms. And no: those empirical Bayes estimates do not have to be normal distributed! You need to decide for yourself (preferably up front) what it takes for these distributions to be in support of your favourite typology, so it requires a bit of thinking. This can all be done in standard statistical software, only requiring knowing a little bit about what you’re doing. It would be really nice to see previous publications using group-based models reanalyzed in this way.

The article also discuss a number of related modeling choices which are highly informative. So far, I have only read the published version of the article very quickly, and I need to read it more carefully before I fully embrace all arguments, but I might very well end up embracing it all.

I have noticed that it has been claimed in the literature that models assuming normal distributed random effects cannot test for the existence of subpopulations. Well, it is the other way around.

The post Criminological progress! appeared on The Grumpy Criminologist 2016-07-04 12:00:12 by Torbjørn.

Testing typological theories using GBTM?

As I mentioned in the post yesterday, I think the debates about group-based trajectory modeling have some unresolved issues. For this reason, I submitted a commentary to Journal of Research in Crime and Delinquency. I had two reasons for doing so. First, I think Nagin mischaracterized his critics, and I believe his essay was a willful attempt to avoid serious criticism by ignoring serious arguments. (Maybe I could have been less outspoken about that). But after all, he has not addressed the actual argument I (and others) have put forward. I can only interpret this as an attempt to avoid discussing the substantive matter by keeping silent, and now subtly dismissing the whole thing. If Nagin find it worthwhile saying his critics have misunderstood, he should also bother to point out how. So far, he has done no such thing.

Second, I actually think there is a need to clarify whether GBTM can test for the presence of groups or not. If the advocates of GBTM had been clear about this, it would obviously not have been needed. There is no doubt that Nagin and others have been clear that GBTM can – or maybe even should – be interpreted as an approximation to a continuous distribution. There is no disagreement on that point. But they have also given the impression that one can identify meaningful real groups in the data by way of GBTM. They have not been clear on what this really means or under what conditions this can be done. A clarification is in order, since it is clear in the literature that findings from GBTM analyses have been interpreted as giving very strong evidence to a certain typological theory (see e.g. here). I have claimed this empirical evidence is weak and largely based on overinterpretation of empirical studies using GBTM (see, here and here). It would be helpful if Nagin could clarify the strength of this evidence.

So I wrote a commentary and submitted it to The Journal of Research in Crime and Delinquency. (See the full commentary here). According to the letter from the editor, it was rejected because:

Language at the top of page 2 in your comment underscores a fundamental misunderstanding and misreading of Nagin’s work.
(See the full rejection letter here).

Well, maybe I should have put things more politely, but I still believe my arguments are right. I can understand that there might be good editorial reason for why not having another debate about GBTM in the journal, but I am not impressed with the reason given. My fundamental misunderstanding is revealed (on the top of page 2) where I point out that Nagin himself is responsible for some of the confusion regarding the interpretation of the groups. I do so with clear references, so you can decide for yourself whether these are misreadings or not.

Even in his recent essay, Nagin presents one of the main motivations for using GBTM by first arguing that other methods are not capable of testing for the presence of groups, and then suggesting that GBTM can indeed solve this problem:

To test such taxonomical theories, researchers had commonly resorted to using assignment rules based on subjective categorization criteria to construct categories of developmental trajectories. While such assignment rules are generally reasonable, there are limitations and pitfalls attendant to their use. One is that the existence of distinct developmental trajectories must be assumed a priori. Thus, the analysis cannot test for their presence, a fundamental shortcoming. (…) The trajectories reported in Figure 2 provide an example of how GBTM models have been applied to empirically test predictions stemming from Moffitt’s (1993) taxonomic theory of antisocial behavior.
(My emphasis).

It might not say straight out whether the groups from GBTM are interpretable as real or not in this setting, nor what can be concluded from such “tests”. But given the previous debates and misconceptions, this is hardly a clarification.

My point is simply this: it has been claimed that GBTM can be used to test for the presence of distinct groups, and generally to test typological theories. (I have discussed this in more detail here and here). However, it is hard to see how such typological theories can be tested using GBTM. That is indeed very vaguely explained by the advocates of the methodology. I think (but I am not entirely sure), that in this context “testing a theory” only means findings that are consistent with a given theory. I think this is a generous use of the term “test”. I prefer to reserve the word “test” for situations where something is ruled out – or when using methods that at least in principle would be able to rule something out. In other words: If the findings are consistent with a theory but also consistent with one or several competing (or non-competing) theories, this is at best weak evidence for either theory. (This holds regardless of methods used). It is good that a theory is consistent with the empirical findings, but that is far from enough. I know of no published criminological study using GBTM that provides a test of typological theories in the stricter sense of the term. So far, it seems to me that the advocates of GBTM have not been clear on this issue. Some clarification would be in order.

The post Testing typological theories using GBTM? appeared on The Grumpy Criminologist 2016-07-01 12:00:36 by Torbjørn.

Social Media Auto Publish Powered By : XYZScripts.com