This work may not be copied, distributed, displayed, published, reproduced, transmitted, modified, posted, sold, licensed, or used for commercial purposes. By downloading this file, you are agreeing to the publisher’s Terms & Conditions.
Revisiting the Abnormal Involuntary Movement Scale: Proceedings From the Tardive Dyskinesia Assessment Workshop
This work may not be copied, distributed, displayed, published, reproduced, transmitted, modified, posted, sold, licensed, or used for commercial purposes. By downloading this file, you are agreeing to the publisher’s Terms & Conditions.
Revisiting the Abnormal Involuntary Movement Scale:
Proceedings From the Tardive Dyskinesia Assessment Workshop
Objective: To provide an historic overview of the Abnormal Involuntary Movement Scale (AIMS) in clinical trials of tardive dyskinesia (TD), with current recommendations for analyzing and interpreting AIMS data.
Participants: Seven psychiatrists and 1 neurologist were selected by the workshop sponsor based on each individual’s clinical expertise and research experience.
Evidence: Using PubMed entries from January 1970 to August 2017, participants selected studies that used the AIMS to evaluate TD treatments. The selections were intended to be representative rather than prescriptive or exhaustive, and no specific recommendations for TD treatment are implied.
Consensus Process: The Working Group met in October 2016 to discuss the AIMS as an assessment tool, outline the challenges of translating clinical trial results into everyday clinical practice, and propose different methods for reporting AIMS data in clinically relevant terms. Recommendations for selecting TD studies for review, analyzing and interpreting AIMS data, and synthesizing discussions among the participants were initiated during the onsite workshop and continued remotely throughout development of this report. Disagreements were resolved via group e-mails and teleconferences. Consensus was based on final approval of this report by all workshop participants.
Conclusions: For both research and clinical practice, the AIMS is a valid measure for assessing TD and the effects of treatment, but alternative analyses of AIMS data (eg, effect size, minimal clinically important difference, response analyses, category shifts) may provide broader evidence of clinical effectiveness. No single analysis of AIMS data can be considered the standard of clinical efficacy; multiple analytic approaches are recommended.
J Clin Psychiatry 2018;79(3):17cs11959
To cite: Kane JM, Correll CU, Nierenberg AA, et al. Revisiting the Abnormal Involuntary Movement Scale: proceedings from the Tardive Dyskinesia Assessment Workshop. J Clin Psychiatry. 2018;79(3):17cs11959.
To share: https://doi.org/10.4088/JCP.17cs11959
© Copyright 2018 Physicians Postgraduate Press, Inc.
aDepartment of Psychiatry, The Zucker Hillside Hospital, Glen Oaks, New York
bDepartment of Psychiatry, The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York
cCenter for Psychiatric Neuroscience, The Feinstein Institute for Medical Research, Manhasset, New York
dDepartment of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
eDepartment of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
fDepartment of Psychiatry, University Hospitals Cleveland Medical Center, Cleveland, Ohio
gMembers of the Tardive Dyskinesia Assessment Working Group are listed at the end of the article.
*Corresponding author: John M. Kane, MD, The Zucker Hillside Hospital—Department of Psychiatry, The Donald and Barbara Zucker School of Medicine at Hofstra/Northwell-Department of Psychiatry, 75-59 263rd St, Glen Oaks, NY 11004 .
Tardive dyskinesia (TD) is a chronic disorder characterized by involuntary stereotyped, choreic, athetoid, and/or dystonic movements in 1 or more areas of the body, including the orofacial region (eg, tongue thrusting, lip smacking and/or pursing, grimacing), extremities (eg, stereotypic piano-playing movements, flexion/extension of the ankles or toes), and torso (eg, choreoathetoid movements, pelvic rocking).1,2 This disorder can result from exposure to dopamine receptor blocking agents (DRBAs) such as antipsychotics and drugs used to treat gastrointestinal disorders (eg, metoclopramide).2 Given the difficulty in treating TD, prevention, close monitoring, and earliest possible diagnosis are critical in optimizing patient outcomes.
The term tardive dyskinesia was coined in 1964 by Faurbye et al3 in an article that described patients who had developed chronic involuntary movements several months after starting antipsychotic treatment. In the following decades, the link between antipsychotics and TD became widely accepted, with attempts to find effective treatments for TD beginning in the early 1970s.4 With development of the newer second-generation (atypical) antipsychotics, it was hoped that the risk for medication-induced TD would diminish.4,5 However, as shown in several recent studies,6-8 TD continues to be a problem in patients who require any type of antipsychotic treatment. In a 2017 meta-analysis of antipsychotic clinical trials conducted by Carbon et al,7 mean probable TD prevalences of 30.0% and 20.7% were found in patients exposed to first- and second-generation antipsychotics, respectively. However, in the subgroup of patients with no lifetime exposure to first-generation antipsychotics, the TD prevalence with second-generation antipsychotics was 7.2%. A contributing factor to the ongoing problem of TD may be the expanding use of atypical antipsychotics in psychiatric indications beyond schizophrenia, including bipolar disorder and refractory major depressive disorder.
A report from the 1992 American Psychiatric Association (APA) TD Task Force cited antipsychotic discontinuation as "the most logical ‘ treatment’ " for managing TD, with dose reduction suggested if discontinuation is unfeasible.9 The 2010 APA Practice Guideline for the Treatment of Patients with Schizophrenia recommends switching to an antipsychotic with a lower risk for TD,10 although no antipsychotic medication is completely risk-free. However, these approaches may not be viable options for patients who require long-term antipsychotic therapy and are psychiatrically stable on their current treatment regimen. Moreover, there is extremely limited information on whether or how often TD resolves and on how long it takes to do so after discontinuation or dose reduction of DRBAs.11
Two reversible and selective vesicular monoamine transporter 2 inhibitors, valbenazine and deutetrabenazine, are now approved by the US Food and Drug Administration (FDA) for the treatment of TD.12,13 A number of other potential treatments have been tried, but as reported by the American Association of Neurology, there was insufficient or limited evidence for many of these drugs.14 In the historical absence of approved TD medications, some medications, such as tetrabenazine, were used off-label based on promising data from open-label or single-center trials.15
The need for effective treatment of TD is underscored by the negative impact of this disorder that stigmatizes patients and contributes to social isolation.16 In some cases, TD can also be physically debilitating and have a serious negative impact on daily functioning and quality of life.2 Even "mild" forms of TD can be highly distressing, especially when noticeable abnormal movements lead to negative consequences, such as loss of vocational opportunities or isolation from family and friends.16 Although possibly confounded by the severity of the underlying psychiatric illness, which can also affect outcomes, the presence of TD in patients with schizophrenia has been associated with increased mortality, poorer treatment outcomes, lower productivity, and reduced quality of life.17-19
Given the ongoing risk of TD in patients requiring antipsychotic medications or other DRBAs and the negative impact of TD on quality of life, the availability of novel treatments and the current resurgence of interest in TD are encouraging and potentially transformative for individuals affected by TD. However, available treatments do not eliminate the need for careful assessment of involuntary movements and preventative efforts. Monitoring and recognition of TD are critical skills for clinicians prescribing DRBAs. As part of this effort, use of the Abnormal Involuntary Movement Scale (AIMS) in TD studies and the challenges of translating AIMS study results into clinical practice need to be addressed.
- The Abnormal Involuntary Movement Scale (AIMS) can be used to measure the severity of abnormal movements in tardive dyskinesia (TD), but diagnosis requires an assessment of medication history and a clinical evaluation of symptoms. Development of standardized guidelines for the screening, diagnosis, and treatment of patients with TD is warranted.
- Many TD clinical trials have used the AIMS as an efficacy outcome, but the studies vary widely in design and conduct. Interpreting trial results and applying them to clinical practice can be challenging. Presenting AIMS data through different types of analyses, such as minimal clinically important difference or response rates, may provide a broader and more clinically relevant perspective on study results.
- Ongoing education and AIMS training may be necessary for improving the diagnosis and treatment of TD in clinical settings.
The Tardive Dyskinesia Assessment Workshop was convened on October 13, 2016, in New York, New York, to discuss the application and interpretation of the AIMS as an assessment tool for TD. Workshop participants (ie, the Working Group) were invited by the sponsor, Neurocrine Biosciences, Inc., based on their clinical expertise and research experience. The Working Group included 7 psychiatrists with interests in psychopharmacology and drug safety (J.M.K. [chair], C.U.C., A.A.N., S.N.C., M.S., J. P. McEvoy, MD; A. J. Cutler, MD) and 1 neurologist specializing in movement disorders (M. A. Stacy, MD).
The contents of this report represent proceedings from the workshop and from subsequent communications, which were conducted via teleconferences, group e-mails, and shared comments on manuscript drafts that were distributed to all workshop participants for feedback. Disagreements were resolved via e-mail or teleconference as needed. Consensus was based on the final approval of this report by all workshop participants. The goals of the report were to (1) provide clinicians with an historical overview of how the AIMS has been used to evaluate TD in clinical studies, (2) outline some of the challenges of translating clinical trial results into clinical practice, (3) discuss various approaches to analyzing and interpreting AIMS data, and (4) provide consensus statements on these areas. The discussion for each of these goals is organized into the 4 main sections below.
THE ABNORMAL INVOLUNTARY MOVEMENT SCALE
Structure and Scoring
The original AIMS, which was developed by the National Institute of Mental Health for research purposes, includes a total of 12 items.20 The first 7 items are used to measure the severity of abnormal movements in the orofacial region (4 items: facial muscles, lips, jaw, tongue), upper extremities (1 item), lower extremities (1 item), and trunk (1 item). Items 8-12 are related to clinician global judgment of severity, patient awareness, incapacitation due to the abnormal movements, and dental status (1 item each). A later version of the AIMS contains 14 items, which includes 2 additional items for edentulousness and the disappearance of abnormal movements during sleep (Table 1).
Directions for scoring are limited in the original AIMS, and a simple rating scale is provided for scoring items 1-7: 0 = none; 1 = minimal, may be extreme normal; 2 = mild; 3 = moderate; 4 = severe. Because of the simplicity of this scale, it is generally agreed that the AIMS can be easily administered in both research and clinical settings. However, it has also been noted that the lack of detailed instruction and descriptors could be challenging for inexperienced raters and therefore contribute to high interrater variability.22,23 To mitigate these potential problems and make the scale more specific to TD, detailed instructions have been developed that include quality (eg, choreic or athetoid), frequency, amplitude, and location of abnormal movements as factors in scoring.22,23
Supplementary instructions for administering the AIMS have been developed and published by several research groups. These instructions include the removal of shoes and socks for examination, not subtracting 1 point for abnormal movements that occur only with activation maneuvers, methods for examining and scoring specific body areas, and methods for scoring item 8 (global judgment of severity) (Table 1). However, one development that is central to understanding clinical trial results is the calculation of a total score. The original AIMS does not include mention of a total score, but it has become a convention to sum the individual scores from items 1-7. Thus, although there is a generally accepted range for the AIMS total score (0 to 28), the range itself is not linear because each constituent item is scored separately. In other words, an AIMS total score of 7 could represent a score of 1 (minimal) on each item or a different combination of item scores—for example, a score of 3 (moderate) on 1 item, score of 4 (severe) on another item, and score of 0 (none) on the remaining items. The AIMS total score may be a useful index for measuring the overall effects of treatment in a TD clinical trial, but, as discussed later, its applicability in clinical practice may be limited.
Instructions and refinements for the global and distress measures (items 8-10) and for dental pathology (items 11-12) have not been standardized. In addition, items 9 and 10 have not correlated with responses on the anatomic measures of severity of TD (items 1-7) and may not be reliable given the lack of awareness and insight reported by some patients.26,27 Given the critical importance of the subjective and social impact of TD, further research to develop reliable instruments for this measure is necessary, as indicated in the recommendations at the end of this report. The following discussion refers only to the AIMS items related to anatomic severity (items 1-7).
AIMS in TD Studies
The original AIMS is strictly an instrument for measuring the anatomic distribution and severity of abnormal movements and does not provide criteria for diagnosing TD. The diagnosis of TD continues to be based on the patient’s clinical presentation, evaluation to rule out other diagnostic possibilities, and medication history, as summarized in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5).28 However, standardized diagnostic criteria for TD have been developed for research purposes, with the most notable examples being the Schooler-Kane24 and Glazer-Morgenstern-Doucette25 criteria (Table 1). These criteria, based on a priori thresholds of severity, are primarily used to estimate the prevalence or incidence of TD in general or clinical populations and to qualify and monitor patients entered into a clinical study. As noted by Schooler and Kane, a more definitive clinical diagnosis of TD requires a history of treatment with antipsychotics (or other DRBAs) and persistence of abnormal movements after discontinuing antipsychotic treatment.24 If the AIMS is included as part of an overall clinical diagnosis of TD, its utility in institutional settings may differ from its use in a more general psychiatric population.
In clinical trials, the AIMS is often used as a safety assessment to monitor the emergence of TD in subjects who are receiving an antipsychotic medication for schizophrenia or other psychiatric disorder. The AIMS is also used as an efficacy measure in clinical trials that focus on improvements in TD. These studies generally rely on the AIMS total score (sum of items 1-7) as an overall index of TD severity. In placebo-controlled trials that include statistical testing, results such as a mean change from baseline in the AIMS total score can help clinicians decide whether to try a new treatment in practice.
However, there are differences across clinical trials that need to be considered when interpreting AIMS results. Such differences are summarized in a sample of TD studies that were selected to illustrate a range of study designs, rating methods for the AIMS, and different types of AIMS results (Table 2). This selection was based on results from a PubMed search that included a simple search string ("Abnormal Involuntary Movement Scale" AND "tardive") and a single set of search dates (from January 1, 1970, to August 31, 2017). The selected studies were intended to be representative rather than prescriptive or exhaustive, and no specific recommendations for TD treatment are implied.
One factor to consider when interpreting AIMS results is the rating method. For example, as in the single-center trial of Ginkgo biloba by Zhang et al,33 each subject may be assessed by the same investigator throughout treatment. Although this investigator may have been blinded to treatment, he or she would have known how long the patient was receiving treatment and could have been more alert to and/or expectant of improvements as the study progressed. In addition, because of different training backgrounds or personal experiences, investigators within a study site might not have applied the same approach for rating the severity of TD movements. Therefore, variability among investigators could be high unless interrater reliability was specifically tested and confirmed. This potential for individual bias and less than optimal interrater variability may be minimized in studies that use central (offsite) video raters. For example, in the multicenter trials of valbenazine and deutetrabenazine,29-32 each subject’s AIMS examination was video recorded in a standardized manner at all study visits. Central raters who viewed these videos were blinded to treatment. Moreover, they did not know which study visit they were viewing since the sequence of the videos was scrambled, which may have minimized the potential for inflated scores at baseline and overly reduced scores at the end of the study. Finally, the valbenazine studies included well-defined anchors for scoring each AIMS item, and all AIMS scoring required a consensus between 2 central raters who watched the videos together. These types of controls are expected to become the new standard for evaluating TD therapies.
Applying the AIMS to Clinical Practice
The AIMS can be used in both research and clinical settings and administered by any health care professional with appropriate training.39,40 Similar to its use in clinical trials, the AIMS examination and rating scores can be used in clinical practice to document the emergence of medication-induced TD and monitor changes in TD severity over time. To this end, formal guidelines have been developed that propose administration of the AIMS at regular intervals in patients receiving antipsychotics in clinical settings (eg, every 3-12 months depending on risk factors).41 However, additional review of these recommendations may be necessary since patients could develop signs of TD that would be missed within these time intervals. A more conservative approach in clinical practice may be for all patients and their caregivers to be informed about regular self-examination and to be questioned and briefly examined for abnormal movements at every clinic visit.
Clinical trials often use mean changes in the AIMS total score to evaluate whether a medication has demonstrable unwanted effects on the incidence of abnormal involuntary movements. However, this can be problematic since newly emergent cases with TD may be obscured in many patients, with no change in the total AIMS score from baseline. In addition, different mathematical approaches (eg, arithmetic mean, geometric mean, median) and analyses (eg, analysis of covariance, mixed-effects model for repeated measures, nonlinear machine learning algorithms) could be used to present score changes. Therefore, categorical, case-based reporting is important and should always accompany mean score reporting. Additionally, the severity of TD cases needs to be reported, which has been absent in most of the recently meta-analyzed prevalence studies of TD in patients receiving antipsychotics.7
Similarly, clinical trials often use mean changes in the AIMS total score to evaluate whether a medication has demonstrable efficacy for TD improvement. When interpreting such results, it is important to remember that the total score is not a linear scale, but rather, the sum of 7 individual item scores. Each AIMS item may have face validity for rating the severity of a particular abnormal movement (ie, from 0 = none to 4 = severe), but the total score does not have ideal clinimetric properties for rating overall severity as discussed above. Clinical researchers should be encouraged to provide reports that include individual item scores in addition to the AIMS total score. This procedure will facilitate the generalizability of research findings, which clinicians can appreciate and compare with their own office-based assessments.
In randomized, controlled clinical trials, the AIMS may best be scored by blinded central raters who are viewing standardized video recordings or by 2-way live examinations and following protocol-defined rating procedures (eg, descriptive anchors for each severity level). In contrast, AIMS scoring in clinical settings usually involves face-to-face interactions between the clinician and patient, both of whom are aware of what treatment has been prescribed and how long the patient has been treated. In a community mental health care setting or group practice, several clinicians may be responsible for the same patient, and differences in their individual experiences could affect how TD is assessed. For example, a clinician with limited AIMS training who has seen only a few cases of TD may have a different concept of what constitutes a "severe" abnormal movement than an AIMS-trained clinician who has seen hundreds of cases over many years. At this time, no single approach can be recommended for improving interrater reliability within clinical settings; each institution or practice needs to develop their own protocols for screening, diagnosis, and monitoring. However, renewed education and training in use of the AIMS are necessary to ensure reliability of ratings. Such methods could include instructional videos, in-house training by an experienced clinician, and/or participation in continuing medical education activities. Moreover, as shown in the work by Lane et al,23 interrater reliability may be improved by implementing specific scoring criteria for AIMS items 1-7.
As an instrument that measures the frequency, amplitude, distribution, and/or persistence of abnormal movements, the AIMS can be administered to any patient regardless of psychiatric diagnosis. From a clinical and patient or caregiver perspective, however, the patient’s diagnosis (eg, schizophrenia or mood disorder), level of functioning, and psychiatric stability may be important factors in determining overall functional significance of TD in terms of impact on quality of life. For example, an individual with stable bipolar disorder and high psychosocial functioning could have a rating of 2 (mild) in a single AIMS item, such as the tongue. In a clinical trial in which efficacy is being evaluated and averaged within a group of participants (rather than in an individual patient), this rating would equal a total score of 2 and might be considered a "low" overall score and would not even meet research diagnostic criteria (eg, Schooler-Kane) for inclusion in the study. In a clinical setting, however, the same individual may complain of having a fairly disabling tongue dyskinesia with considerable disruption to social and work activities, and the practicing physician would have to make the diagnosis of TD and could consider this patient as having a significant or even serious case of TD. Another individual with unstable schizophrenia and minimal social interaction could have the same rating (ie, score of 2 on the AIMS tongue item only), but in this case, the TD may not be considered functionally significant by the physician because the movements may be overshadowed by other urgent psychosocial needs.
In addition to the natural variability of TD, which can fluctuate during the day and over time,42 TD varies widely from patient to patient, in terms of both clinical presentation and psychosocial impact. Consequently, applying clinical study results to this heterogeneous population can be challenging. Establishing a statistically significant change in the mean AIMS total score in a clinical trial is a crucial initial step for demonstrating the efficacy of a TD treatment. However, as discussed in the following section, additional types of AIMS analyses are both possible and necessary for ascertaining the potential benefits of various TD medications.
Types of AIMS Analyses
The Working Group discussed the different methods that could be used to analyze AIMS total and item scores in a clinically meaningful way. Results of the discussion are presented below, along with a general caveat that the clinical relevance of any specific analysis may be driven by what the individual clinician wants to know and the specific therapeutic needs of the patient and the patient’s caregiver. From an epistemic standpoint, it should be noted that application of these analyses to certain types of clinical trial data may be inappropriate. For example, "clinical relevance" is not germane to a proof-of-concept trial that was only designed to detect a possible drug effect. Therefore, application of the analyses described below may need to be limited to data from larger and well-controlled studies that were specifically designed to establish efficacy.
Treatment Effect Sizes
Treatment effect sizes provide a way to standardize mean score changes by incorporating placebo effects, sample sizes, and standard deviations. Such standardization allows the results of 1 assessment (eg, AIMS) to be quantitatively compared with the results of another assessment (eg, Unified Dyskinesia Rating Scale). For the AIMS, treatment effect sizes could be estimated for the total score and/or individual item scores, with the interpretation of effect sizes initially following general conventions. For example, a Cohen d = 0.5 may indicate a moderate or medium treatment effect. However, as Jacob Cohen himself cautioned, "the terms ‘ small,’ ‘ medium,’ and ‘ large’ are relative, not only to each other, but to the area of behavioral science or even more particularly to the specific content and research method being employed in any given investigation."43 Therefore, more research in the field is needed to better understand what constitutes a clinically meaningful treatment effect size for TD beyond a mathematical or statistical metric.
Minimal Clinically Important Difference
The minimal clinically important difference (MCID) is the mean score change for an assessment of interest (eg, AIMS) in subjects who experienced a defined level of clinical benefit. An MCID is unlikely to influence clinical decisions unless it represents a minimal level of acceptable improvement. To that end, MCIDs are often based on a clinician- or patient-rated anchor scale (eg, the Clinical Global Impression of Change-Tardive Dyskinesia [CGI-TD] or Patient Global Impression of Change [PGIC]), the standard deviation or standard error of the mean for the assessment of interest, and/or by expert consensus (eg, Delphi method).
The Working Group agreed that no MCID for the AIMS has been established in TD. As a test case, data were pooled from placebo-controlled trials of valbenazine and analyzed using a minimal global response (CGI-TD rating of "minimally improved" or better, score ≤ 3) and a more robust response (CGI-TD rating of "much improved" or better, score ≤ 2) as anchors. Among subjects who met either of these CGI-TD criteria (regardless of treatment), the mean changes from baseline in AIMS total score were −2.2 and −3.4, respectively. These results suggest that in adults with TD, the MCID for AIMS total score may be 2 or 3 points.44 A similar approach could be taken in which CGI-TD responses are correlated with the percent change from baseline in AIMS total score. Further MCID analyses from current TD trials are warranted, and a manuscript from the Working Group is currently in development.
Response analyses are valuable for identifying the percentage of individual subjects in a clinical trial who achieved a specific threshold of improvement, although the utility of a threshold depends on its application. For example, a low threshold (≥ 10% or ≥ 20% improvement) may be sufficient in a proof-of-concept study to establish possible treatment effect, but it may be insufficient for establishing a clinically meaningful response or making treatment decisions.
TD studies have historically defined AIMS response as a ≥ 30% decrease (improvement) from baseline in total score,33,45,46 but a more rigorous definition of response (≥ 50% decrease from baseline) has been used in recent clinical trials.29,32 Both benchmarks may be meaningful to clinicians—one because of its historical context and the other because of its stringency. Correlating the percent change in AIMS total score with global anchors of response (eg, CGI-TD response, as described above) would provide additional information about which levels of AIMS response are most clinically meaningful. Because individual patients may have different treatment goals, and percent improvement from baseline is highly influenced by where patients start out, presenting a full range of response criteria (eg, ≥ 10% to ≥ 90%) could also be informative for clinicians. Since patients and caregivers often inquire about the "odds of getting better," clinical trial reports could also include odds ratios and numbers needed to treat (NNTs) for response analyses.
Percent Change From Baseline
The percent change from baseline in the AIMS total score (or in individual AIMS item scores) can be an additionally informative way to present the magnitude of improvement. In contrast to response analyses, which do not include subjects who failed to meet a specific threshold (eg, a subject with 49% improvement cannot be counted in the ≥ 50% response group), percent change captures the experiences of all subjects in a clinical trial and is an intuitively understandable analysis for clinicians, patients, and caregivers. However, as previously mentioned, this type of analysis is highly influenced by baseline severity.
Complete Response or Symptomatic Remission
Clinicians and patients may be particularly interested in the likelihood of substantially reducing, or even eradicating, the signs and symptoms of TD. Response analyses, whether defined as a ≥ 30%, ≥ 50%, or other reduction in AIMS total score, can show how many subjects in a clinical trial met an overall threshold of TD improvement, but they cannot provide adequate information about symptom resolution, or where patients "end up." For example, a subject with an AIMS total score of 12 (eg, score = 3 in 4 different items) could have a 50% reduction in total score after treatment, but the subject could still be experiencing moderate symptoms in some areas of the body (eg, score = 3 in 2 items and score = 0 in 2 items). In contrast, a complete response analysis—possibly defined as a score of 0 or 1 on all 7 movement-related items of the AIMS—could show how many subjects had no symptoms or minimal severity in each region of the body after treatment. As with the response analyses, odds ratios and NNTs for complete response would help to make results more accessible to clinicians and patients. Complete response may not be an appropriate analysis for all clinical trials or applicable to all types of patients, but it could provide a useful data point for assessing treatment options and for informing the clinical decision-making process. It should be noted, however, that a "complete response" measured by AIMS ratings refers to a diminution in the objective severity of observable abnormal movements, but the AIMS alone cannot distinguish complete suppression or masking of symptoms from true reversal or remission of TD itself. Such recovery may require that the patient no longer meets the clinical criteria for TD.
In contrast to percent improvements, which do not take baseline scores into account, category shifts incorporate baseline severity as part of the analysis. As such, shift analyses may be particularly useful when addressing a heterogeneous disorder such as TD. A shift could be defined as a 2-point reduction from baseline in any AIMS item in which the baseline score was ≥ 2, as was done in a recent study of deutetrabenazine.47 Another approach would be to analyze categorical shifts based on an "average" item score, calculated as the AIMS total score at baseline divided by the number of AIMS items that have a score ≥ 1. The shift could then be defined as an average item score of ≥ 3 at baseline and a score of ≤ 2 after treatment. A final category shift analysis could target syndromal remission (ie, a shift below the syndromal definition of TD), which would translate to no more than a single score of 2 according to Schooler-Kane criteria.24
Functional Remission and Recovery
All of the approaches described above are based on a symptomatic assessment of TD using the AIMS. However, as stated in previous sections of this report, patients do experience varying levels of impairment in subjective well-being, quality of life, and functionality due to TD. Scales that measure such impairment are currently missing but urgently needed, and to achieve complete recovery from TD, such measures are needed to complement AIMS score assessments. Moreover, results on functional measures should meet a minimum threshold of improvement before concluding that a patient has achieved both symptomatic and functional remission from TD and before recovery can be diagnosed. The Working Group recommends a minimum duration of 3 months for sustained symptomatic and functional remission.
SUMMARY AND FUTURE DIRECTIONS
The Working Group agreed that no single analysis can be considered the most clinically relevant method for analyzing AIMS data; therefore, it is important that different types of analyses be conducted and presented, with clear statements as to what was conducted a priori versus post hoc. This multifaceted analytic approach would help achieve the following: (1) confirm the robustness of data from a clinical trial, (2) confirm efficacy across patient subgroups within a clinical trial, and (3) demonstrate that AIMS results from different or replicate clinical trials are consistent and reinforcing. More discussion is needed between researchers and clinicians to ascertain if there are mutually acceptable standards of clinical relevance that could be used to interpret AIMS results across different clinical trials. With the recent publication of large and well-controlled clinical trials that showed substantial and statistically significant AIMS improvements with valbenazine and deutetrabenazine,29-32 one line of inquiry would be to assess whether research diagnostic criteria such as Schooler-Kane24 are adequately sensitive. In addition, TD researchers and clinicians may need to start thinking about optimal ways to adopt new technologies, such as wearable sensors, computerized video-based ratings, and smartphone-based monitoring of abnormal movements and other symptoms.48-51
The Working Group also agreed that more research is needed to better understand the relationship between the AIMS and other assessment tools, including patient-reported measures, caregiver/informant measures, functional scales, and quality of life questionnaires. Future research should also include the use of a broader array of descriptive statistics from the AIMS in epidemiologic studies of TD, particularly in at-risk populations (eg, elderly, women) and in different populations of patients requiring antipsychotic treatment (eg, schizophrenia, schizoaffective disorder, bipolar disorder, refractory major depressive disorder).
- Diagnosis of TD is based on medication history and presentation of symptoms. The AIMS alone is not a diagnostic tool, but it can be included in a comprehensive assessment of the patient.
- The AIMS can be used to assess or monitor TD in clinical trials and clinical practice, including the emergence of treatment-related abnormal movements. Clinical trials should implement a method that ensures interrater reliability (eg, consensus of blinded central video raters). Clinical practices are encouraged to implement a standard protocol for TD screening, diagnosis, and assessment that includes a minimal requirement for AIMS training (eg, completion of an instructional video).
- The AIMS can be administered by any trained health care provider to any patient regardless of psychiatric diagnosis.
- Although AIMS scores provide reliable information about the severity of abnormal movements, both in research and in practice, they may not be sufficient for indicating the clinical severity and functional impact of TD. Factors such as the patient’s psychiatric status, insight into mental and physical illness, subjective well-being, quality of life, and social or occupational burden need to be considered as part of a comprehensive clinical assessment. Such evaluation may require the use of additional scales. However, patient awareness and impact of TD should not alter the motor examination for AIMS items 1-7.
- The AIMS total score is not a linear scale, and caution should be taken in drawing any conclusions about overall TD severity based on total score alone.
- Many studies have used the AIMS to evaluate the effects of treatment on TD. However, TD studies vary widely in design and conduct, and any interpretation of results would benefit from understanding how the AIMS was administered and scored.
- In contemporary TD clinical trials, efficacy is generally based on a mean change from baseline in the AIMS total score, with significance testing versus placebo. Other types of AIMS analyses are possible and warranted to broaden the scope of clinically meaningful study results, although no single type of analysis can be considered a definitive measure of clinical significance. It is recommended that clinical trials provide different types of analyses so that each clinician can find the types of results that are most applicable to his or her practice (eg, treatment effect size, percent score improvement, MCID, response analyses, shift analyses).
- Professional or regulatory agencies should consider convening a task force to develop standardized guidelines for measuring and reporting efficacy in clinical trials across a broad array of analyses using the AIMS, including new instruments for measuring insight, awareness, and the psychosocial and vocational stigma and impact of TD.
- Re-education programs targeting psychiatrists in clinical practice should be developed and disseminated to enhance and standardize examination, diagnosis, and treatment guidelines based on the AIMS for patients with TD. Ongoing efforts are being made within the research community to increase awareness of TD and establish standards that can guide screening/diagnosis and treatment. Psychiatric clinics and offices are encouraged to begin identifying training requirements that are appropriate and easy to implement.
Submitted: October 4, 2017; accepted March 16, 2018.
Published online: May 8, 2018.
Tardive Dyskinesia Assessment Working Group: Stanley N. Caroff (Perelman School of Medicine, University of Pennsylvania), Christoph U. Correll (Zucker Hillside Hospital; Hofstra Northwell School of Medicine; Feinstein Institute for Medical Research), Andrew J. Cutler (Meridien Research), John M. Kane (Zucker Hillside Hospital; Hofstra Northwell School of Medicine), Joseph P. McEvoy (Medical College of Georgia, Augusta University), Andrew A. Nierenberg (Massachusetts General Hospital), Martha Sajatovic (University Hospitals Cleveland Medical Center), and Mark Stacy (Brody School of Medicine, East Carolina University).
Potential conflicts of interest: Dr Kane has been a consultant and/or advisor to and/or has received honoraria from Alkermes, Allergan, Bristol-Myers Squibb, IntraCellular Therapies, Janssen, Lundbeck, Minerva, Neurocrine, Otsuka, Pierre Fabre, Reviva, Sunovion, Takeda, and Teva and is a shareholder of LB Pharma, MedAvante, and The Vanguard Research Group. Dr Correll has been a consultant and/or advisor to or has received honoraria from Alkermes, Allergan, Bristol-Myers Squibb, Gerson Lehrman Group, IntraCellular Therapies, Janssen/J&J, LB Pharma, Lundbeck, Medavante, Medscape, Neurocrine, Otsuka, Pfizer, Sunovion, Takeda, and Teva; provided expert testimony for Bristol-Myers Squibb, Janssen, and Otsuka; served on a Data Safety Monitoring Board for Lundbeck and Pfizer; and received grant support from Takeda. Dr Nierenberg has received research grants from Takeda/Lundbeck, PamLab, GlaxoSmithKline, NeuroRX Pharma, Marriott Foundation, National Institutes of Health, Brain & Behavior Research Foundation, Janssen, and the Patient Centered Outcomes Research Institute and has been a consultant and/or attended advisory board meetings for Takeda/Lundbeck, PamLab, Alkermes, PAREXEL, Sunovion, Naurex, Neurocrine, Hoffman-La Roche/Genentech, Eli Lilly, Pfizer, SLACK, and Physicians Postgraduate Press. Dr Caroff has been a consultant and/or attended advisory board meetings for Teva and Neurocrine Biosciences and received grant support from Neurocrine Biosciences. Dr Sajatovic has received research grants from Merck, Alkermes, Janssen, Reuter Foundation, Woodruff Foundation, Reinberger Foundation, National Institutes of Health, and Centers for Disease Control and Prevention; been a consultant to Bracket, Prophase, Otsuka, Sunovion, Neurocrine, Supernus, and Health Analytics; received royalties from Springer Press, Johns Hopkins University Press, Oxford Press, and UpToDate; and been compensated for CME activities by the American Physician Institute, MCM Education, CMEology, and Potomac Center for Medical Education.
Funding/support: The Tardive Dyskinesia Assessment Workshop was supported by Neurocrine Biosciences, Inc., San Diego, California. All workshop members (authors and nonauthors) received an honorarium for onsite participation. However, no remuneration was provided to participants for their contributions as an author or reviewer of this publication.
Role of sponsor: The sponsor was involved in planning and organizing the Tardive Dyskinesia Assessment Workshop. The authors are responsible for the content, final approval, and decision to submit this manuscript for publication.
Acknowledgments: Clinical trial data for valbenazine were provided by Neurocrine Biosciences, Inc., San Diego, California. Medical reviews of the manuscript were provided by Christopher O’ Brien, MD; Grace Liang, MD; Scott Siegert, PharmD; and Khodayar Farahmand, PharmD, all of whom are full-time employees at Neurocrine Biosciences, Inc. Writing and editorial assistance was provided by Mildred Bahn at Prescott Medical Communications Group, Chicago, Illinois, with support from Neurocrine Biosciences, Inc.
6. Woods SW, Morgenstern H, Saksa JR, et al. Incidence of tardive dyskinesia with atypical versus conventional antipsychotic medications: a prospective cohort study. J Clin Psychiatry. 2010;71(4):463-474. PubMed CrossRef
8. Bakker PR, de Groot IW, van Os J, et al. Long-stay psychiatric patients: a prospective study revealing persistent antipsychotic-induced movement disorder. PLoS One. 2011;6(10):e25588. PubMed CrossRef
11. Zutshi D, Cloud LJ, Factor SA. Tardive syndromes are rarely reversible after discontinuing dopamine receptor blocking agents: experience from a university-based movement disorder clinic. Tremor Other Hyperkinet Mov (N Y). 2014;4:266. PubMed CrossRef
14. Bhidayasiri R, Fahn S, Weiner WJ, et al. Evidence-based guideline: treatment of tardive syndromes: report of the Guideline Development Subcommittee of the American Academy of Neurology. Neurology. 2013;81(5):463-469. PubMed CrossRef
17. Browne S, Roe M, Lane A, et al. Quality of life in schizophrenia: relationship to sociodemographic factors, symptomatology and tardive dyskinesia. Acta Psychiatr Scand. 1996;94(2):118-124. PubMed CrossRef
18. Ascher-Svanum H, Zhu B, Faries D, et al. Tardive dyskinesia and the 3-year course of schizophrenia: results from a large, prospective, naturalistic study. J Clin Psychiatry. 2008;69(10):1580-1588. PubMed CrossRef
22. Munetz MR, Benjamin S. How to examine patients using the Abnormal Involuntary Movement Scale. Hosp Community Psychiatry. 1988;39(11):1172-1177. PubMed
25. Glazer WM, Morgenstern H, Doucette JT. Predicting the long-term risk of tardive dyskinesia in outpatients maintained on neuroleptic medications. J Clin Psychiatry. 1993;54(4):133-139. PubMed
27. Arango C, Adami H, Sherr JD, et al. Relationship of awareness of dyskinesia in schizophrenia to insight into mental illness. Am J Psychiatry. 1999;156(7):1097-1099. PubMed
29. Hauser RA, Factor SA, Marder SR, et al. KINECT 3: a phase 3 randomized, double-blind, placebo-controlled trial of valbenazine (NBI-98854) for tardive dyskinesia. Am J Psychiatry. 2017;174(5):476-484. PubMed CrossRef
31. Anderson KE, Stamler D, Davis MD, et al. Deutetrabenazine for treatment of involuntary movements in patients with tardive dyskinesia (AIM-TD): a double-blind, randomised, placebo-controlled, phase 3 trial. Lancet Psychiatry. 2017;4(8):595-604. PubMed CrossRef
32. O’ Brien CF, Jimenez R, Hauser RA, et al. NBI-98854, a selective monoamine transport inhibitor for the treatment of tardive dyskinesia: A randomized, double-blind, placebo-controlled study. Mov Disord. 2015;30(12):1681-1687. PubMed CrossRef
33. Zhang WF, Tan YL, Zhang XY, et al. Extract of ginkgo biloba treatment for tardive dyskinesia in schizophrenia: a randomized, double-blind, placebo-controlled trial. J Clin Psychiatry. 2011;72(5):615-621. PubMed CrossRef
37. Ondo WG, Hanna PA, Jankovic J. Tetrabenazine treatment for tardive dyskinesia: assessment by randomized videotape protocol. Am J Psychiatry. 1999;156(8):1279-1281. PubMed
39. Menzies V, Farrell SP. Schizophrenia, tardive dyskinesia, and the Abnormal Involuntary Movement Scale (AIMS). J Am Psychiatr Nurses Assoc. 2002;8(2):51-56. CrossRef
40. Bark N, Florida D, Gera N, et al. Evaluation of the routine clinical use of the Brief Psychiatric Rating Scale (BPRS) and the Abnormal Involuntary Movement Scale (AIMS). J Psychiatr Pract. 2011;17(4):300-303. PubMed CrossRef
45. Hayashi T, Yokota N, Takahashi T, et al. Benefits of trazodone and mianserin for patients with late-life chronic schizophrenia and tardive dyskinesia: an add-on, double-blind, placebo-controlled study. Int Clin Psychopharmacol. 1997;12(4):199-205. PubMed CrossRef
47. Fernandez HH, Factor SA, Jimenez-Shahed J, et al. Deutetrabenazine treatment effect in each component of the total Abnormal Involuntary Movement Scale (AIMS) (abstract #916). Mov Disord. 2016;31(suppl 2):S294.
49. LeMoyne R, Tomycz N, Mastroianni T, McCandless C, Cozza M, Peduto D. Implementation of a smartphone wireless accelerometer platform for establishing deep brain stimulation treatment efficacy of essential tremor with machine learning. Conf Proc IEEE Eng Med Biol Soc. 2015;2015:6772-6775. PubMed