Two years ago, a paper by Swedish neuroscientist Anders Eklund and colleagues caused a media storm. The paper, Cluster Failure, reported that the most widely used methods for the analysis of fMRI data are flawed and produce a high rate of false positives.
As I said at the time, Cluster Failure wasn’t actually making especially new claims because Eklund et al. had been publishing quite similar results years earlier – but it wasn’t until Cluster Failure that they attracted widespread attention.
Perhaps Cluster Failure went mainstream is that it was the first of Eklund et al.’s false positive papers to be published in a high-impact journal (PNAS). But another reason is that it contained an alarming statement, namely that “These results question the validity of some 40,000 fMRI studies.” This triggered many headlines implying that all of fMRI was suspect.
Tom Nichols, one of the Cluster Failure authors, later clarified that 40,000 referred to the total number of fMRI studies out there, and wasn’t meant to imply that all of those studies were invalid. He went on to estimate that about 10% of the 40,000 fMRI experiments were at high risk of false positives from the Cluster Failure problem, although another 33% suffer from a different problem (no multiple comparisons correction at all.)
Now, Eklund et al. have released a biorXiv preprint looking back on their much-discussed 2016 paper: Cluster Failure Revisited: Impact of First Level Design and Data Quality on Cluster False Positive Rates
In the new article, Eklund et al. consider various technical critiques of their previous work, but conclude that they are unfounded. In response to concerns that the high false positive rates in Cluster Failure were a result of “idiosyncratic attributes of our first level designs”, they show that elevated false positive rates are also seen with modified designs (first level models with two regressors, and models with two intersubject-randomized regressors).
Eklund et al. also further explore how best to analyze fMRI data to avoid false positives. After trying several different approaches, they conclude that a combination of two things is required: non-parametric statistical thresholding, and noise reduction (using ICA FIX).
Finally, Eklund et al. revisit the question of how many fMRI papers may be compromised by sub-optimal analysis. They reiterate that about 10% of fMRI papers used a multiple comparisons correction that Cluster Failure showed to be most problematic (p<0.01 cluster defining threshold), and say that any marginally significant results obtained by this method (p close to 0.05) should be “judged with great skepticism”. But studies that used no multiple comparisons correction at all are even more dubious, the authors say.
In my view, for all of the hyperbole it created, Cluster Failure was a great paper and highlights an issue that shouldn’t be ignored. Yet I wonder whether new and different problems may be on the horizon.
New analysis tools for fMRI have emerged in the past few years, which – unlike earlier methods – aren’t based on mapping clusters of brain activity. Popular new methods include MVPA and network-based analyses. These approaches are exciting and impressive and they are not (as far as I know) subject to the problems identified in Cluster Failure. But then again, they are new enough that they might have undiscovered issues of their own. In 2026, will we be worrying about the implications of a bombshell paper called Network Failure…?