Pitfalls With the Unquestioning Use of Statistics

Shian Ming Tan, MMed

J Clin Psychiatry 2016;77(8):e1005

Article Abstract

Because this piece does not have an abstract, we have provided for your benefit the first 3 sentences of the full text.

To the Editor: With regard to the recent article by O’ Regan and colleagues, I share the authors’ concern that patients with Alzheimer’s disease are maintained on cholinesterase inhibitors for much longer periods than what the conclusions of randomized controlled trials (RCTs) allow and that clinical guidelines have been vague on this issue. While I welcome their research initiative to summarize the effects of cholinesterase inhibitor discontinuation, I find their synthesis of the summary estimates flawed.

See reply by Mazereeuw et al and article by O’ Regan et al

This work may not be copied, distributed, displayed, published, reproduced, transmitted, modified, posted, sold, licensed, or used for commercial purposes. By downloading this file, you are agreeing to the publisher’s Terms & Conditions.

Pitfalls With the Unquestioning Use of Statistics

To the Editor: With regard to the recent article by O’ Regan and colleagues,1 I share the authors’ concern that patients with Alzheimer’s disease are maintained on cholinesterase inhibitors for much longer periods than what the conclusions of randomized controlled trials (RCTs) allow and that clinical guidelines have been vague on this issue. While I welcome their research initiative to summarize the effects of cholinesterase inhibitor discontinuation, I find their synthesis of the summary estimates flawed.

In any meta-analysis, heterogeneity of the effect sizes of individual RCTs is assessed, followed by the calculation of the average effect size using an appropriate model (fixed or random effects), based on the extent of heterogeneity. This was precisely what O’ Regan et al did, and at first glance, this is all well and good. However, on closer scrutiny, the unquestioning use of the result heterogeneity I2 = 0% raises concerns.

To better understand the issues with the authors’ erroneous interpretation of I2, we need to revisit the evolution of the 2 most commonly used measures for heterogeneity: Q statistic and I2. The Q test was established in 1954 to evaluate heterogeneity. Its shortcoming is its poor power to detect heterogeneity when the meta-analysis has few studies.2 The I2 was subsequently developed3 to overcome this issue. However, new evidence indicates that both tests perform similarly—just like the Q test, the I2 index has low power with a small number of studies.4

Applying these findings to this meta-analysis, the number of studies used is 3 (for outcome measure of Neuropsychiatric Inventory) or 5 (Mini-Mental State Examination). Hence, the I2 estimate here is likely underpowered to detect heterogeneity. If one scrutinizes the characteristics of the RCTs used for the meta-analysis, this conclusion makes sense, intuitively. Indeed, the authors have astutely pointed out that "the lack of heterogeneity’ ¦is surprising considering the variation between study designs."1(p e1429) It is thus a pity that they did not follow through with their observation, but based their choice of a fixed effects model on the finding of I2 = 0%. This highlights the perils of the unquestioning use of statistics.

Given the current state of affairs in meta-analyses, whereby the median number of studies is 7 in Cochrane Reviews and 12 in published journals,5,6 the unsavory ingredients for an underpowered I2 estimate are likely to feature prominently in the synthesis of summary estimates. The unfortunate example seen in this article should hence not be viewed in isolation.

Moving forward, what could be done for better reporting and interpretation of I2 in meta-analyses? Suggestions include the reporting of I2 with its 95% confidence intervals, the routine use of the random effects model4 regardless of the point estimate of I2, and the use of sensitivity analyses based on a plausible spectrum of degrees of heterogeneity.

While no statistical maneuver is perfect, it is pertinent that measures are accurately presented to highlight the existing vagaries of biostatistics. Otherwise, the undiscerning use of statistics may turn into a mere number-crunching exercise that could ultimately misinform.

References

1. O’ Regan J, Lanct×´t KL, Mazereeuw G, et al. Cholinesterase inhibitor discontinuation in patients with Alzheimer’s disease: a meta-analysis of randomized controlled trials. J Clin Psychiatry. 2015;76(11):e1424-e1431. PubMed doi:10.4088/JCP.14r09237

2. Alexander RA, Scozzaro MJ, Borodkin LJ. Statistical and empirical examination of the chi-square test for homogeneity of correlations in meta-analysis. Psychol Bull. 1989;106(2):329-331. doi:10.1037/0033-2909.106.2.329

3. Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539-1558. PubMed doi:10.1002/sim.1186

4. Huedo-Medina TB, Sánchez-Meca J, Mar×n-Mart×nez F, et al. Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol Methods. 2006;11(2):193-206. PubMed doi:10.1037/1082-989X.11.2.193

5. Moher D, Tetzlaff J, Tricco AC, et al. Epidemiology and reporting characteristics of systematic reviews. PLoS Med. 2007;4(3):e78. PubMed doi:10.1371/journal.pmed.0040078

6. Tricco AC, Tetzlaff J, Pham B, et al. Non-Cochrane vs Cochrane reviews were twice as likely to have positive conclusion statements: cross-sectional study. J Clin Epidemiol. 2009;62(4):380-386, e1. PubMed doi:10.1016/j.jclinepi.2008.08.008

Shian Ming Tan, MMeda

[email protected]

aDepartment of Psychiatry, Singapore General Hospital, Singapore

Potential conflicts of interest: None.

Funding/support: None.

J Clin Psychiatry 2016;77(8):e1005

dx.doi.org/10.4088/JCP.16lr10666