Drug labels (prescribing information or package inserts) describe what a particular medicine is supposed to do, who should or should not take it, how to use it, and specific safety concerns. The US Food and Drug Administration (FDA) publishes regulations governing the content and format of this information to provide recommendations for applicants developing labeling for new drugs and revising labeling for already approved drugs. One of the major aspects of drug information are safety concerns in the form of Adverse Drug Reactions (ADRs). In this evaluation, we are focusing on extraction of ADRs from the prescription drug labels.
FDA guidelines for applicants define ADRs as follows:
Adverse Event: refers to any untoward medical event associated with the use of a drug in humans, whether or not considered drug-related.
Adverse Reaction: an undesirable effect reasonably associated with the use of a drug, that may occur as part of the pharmacological action of the drug or may be unpredictable in its occurrence. This definition does not include all adverse events observed during use of a drug, only those for which there is some basis to believe there is a causal relationship between the drug and the occurrence of the adverse event. Adverse reactions may include signs and symptoms, changes in laboratory parameters, and changes in other measures of critical body function, such as vital signs and ECG.
Serious Adverse Reaction: refers to any reaction occurring at any dose that results in any of the following outcomes: death, a life-threatening adverse experience, inpatient hospitalization or prolongation of existing hospitalization, a persistent or significant disability or incapacity, or a congenital anomaly or birth defect.
The FDA is highly interested in automatic extraction of ADRs from drug labels for many purposes. Two possible applications enabled by this task are: (1) comparing the ADRs present in labels from different manufacturers for the same drug, and (2) performing post-marketing safety analysis (pharmacovigilance) by identifying new ADRs not currently present in the labels.
Specifically for the purposes of post-marketing safety analysis, the FDA relies on spontaneous adverse event reports submitted to the FDA Adverse Event Reporting System (FAERS). To detect previously unknown ADRs, the current approach to FAERS case report review requires manually reading the text of a drug label to determine if a reported ARD is already noted in the label. To improve the efficiency of this process, the extraction of the ADRs from the drug labels needs to be automated.
The results of this track will inform future FDA efforts at automating important safety processes, and could potentially lead to future FDA collaboration with interested researchers in this area.
The purpose of this TAC track is to test various natural language processing (NLP) approaches for their information extraction (IE) performance on adverse reactions found in structured product labels (SPLs, or simply "labels"). A large set of labels will be provided to participants, of which 101 will be annotated with adverse reactions. Additionally, the training labels will be accompanied by the MedDRA Preferred Terms (PT) and Lower Level Terms (LLT) of the ADRs in the drug labels. This corresponds to the primary goal of the task: to identify the known ADRs in a SPL in the form of MedDRA concepts. Participants will be evaluated by their performance on a held-out set of labeled SPLs.
The participants will be provided with over one thousand drug labels as text documents in an XML format. Of these, 101 drug labels will form the official training set and contain gold standard annotations created by NLM and FDA.
The gold standard contains the following mention-style annotations:
AdverseReaction: Reported ADRs that can be associated with use of the drug or any of its components, following the definitions provided above. This may include signs and symptoms, worsening medical conditions, changes in laboratory parameters, and changes in other measures of critical body function, such as vital signs and ECG results.
Severity: Measurement of the severity of a specific AdverseReaction. This can be qualitative terms (e.g., "major", "critical", "serious", "life-threatening") or quantitative grades (e.g., "grade 1", "Grade 3-4", "3 times upper limit of normal (ULN)", "240 mg/dL").
DrugClass: The class of drug that the specific drug belongs to. This is designed to capture drug class effects (e.g., "[beta blockers]DrugClass may result in...") that are not necessarily specific to the particular drug.
Negation: Trigger word for event negation.
Animal: Animal species in which an AdverseReaction was observed.
Factor: Any additional aspect of an AdverseReaction that is not covered by one of the other mentions listed here. Notably, this includes hedging terms (e.g., may, risk, potential), references to the placebo arm of a clinical trial, or specific sub-populations (e.g., pregnancy, fetus).
Note: Other than AdverseReactions, mentions are only annotated when related to an AdverseReaction by one of the following relations. When an animal, drug class, negation, etc. is not associated with an AdverseReaction, it is not annotated.
The following relations connect an AdverseReaction with one of the above mentions. Each relation is limited to a specific subset of mention types.
Negated: A Negation or Factor that negates an AdverseReaction for the drug.
Hypothetical: An Animal, DrugClass, or Factor that speculates about, or qualifies the definitiveness of the drug's relationship with an AdverseReaction.
Effect: A Severity of an AdverseReaction for the drug.
The ultimate aim is to know which ADRs are in the labels, not the precise offsets or relations, such that the ADRs may be linked to a structured knowledge source (MedDRA). Further, an ADR mentioned several times should not necessarily carry more weight than an ADR mentioned once. As such, the gold standard contains a list of unique ADRs aggregated at the document level (by string). These strings are then annotated with MedDRA Lower Level Terms (LLT) and the corresponding Preferred Term (PT).
The exact XML format is specified below, but first an introduction to the tasks will be provided, since this gives some context to the formatting choices.
Additional details regarding the annotation process can be found in the Annotation Guidelines provided to the annotators.
The participants may choose any one specific task described below or approach the tasks as each one building upon the previous tasks. Some tasks do necessarily require the output of previous tasks, e.g., Task 2 requires Task 1, but Task 3 can be performed independently.
Task 1: Extract AdverseReactions and related mentions (Severity, Factor, DrugClass, Negation, Animal). This is similar to many NLP Named Entity Recognition (NER) evaluations.
Task 2: Identify the relations between AdverseReactions and related mentions (i.e., Negated, Hypothetical, and Effect). This is similar to many NLP relation identification evaluations.
Task 3: Identify the positive AdverseReaction mention names in the labels. For the purposes of this task, positive will be defined as the caseless strings of all the AdverseReactions that have not been negated and are not related by a Hypothetical relation to a DrugClass or Animal. Note that this means Factors related via a Hypothetical relation are considered positive (e.g., "[unknown risk]Factor of [stroke]AdverseReaction") for the purposes of this task. The result of this task will be a list of unique strings corresponding to the positive ADRs as they were written in the label.
Task 4: Provide MedDRA PT(s) and LLT(s) for each positive AdverseReaction (occassionally, two or more PTs are necessary to fully describe the reaction). For participants approaching the tasks sequentially, this can be viewed as normalization of the terms extracted in Task 3 to MedDRA LLTs/PTs. Because MedDRA is not publicly available, and contains several versions, a standard version of MedDRA v18.1 will be provided to the participants. Other resources such as the UMLS Terminology Services may be used to aid with the normalization process.
The data is provided in an XML format. What follows is the XML for the drug Choline:
<?xml version="1.0" encoding="UTF-8"?> <Label drug="choline" track="TAC2017_ADR"> <Text> <Section name="adverse reactions" id="S1">6 ADVERSE REACTIONS Exclusive of an uncommon, mild injection site reaction, no adverse reactions to 11 C-choline have been reported. EXCERPT: Exclusive of an uncommon, mild injection site reaction, no other adverse reactions have been reported ( 6 ). To report SUSPECTED ADVERSE REACTIONS, contact Division of Nuclear Medicine, Department of Radiology, Mayo Clinic at 507-284-2511 or FDA at 1-800-FDA-1088 or www.fda.gov/medwatch</Section> <Section name="warnings and precautions" id="S2">5 WARNINGS AND PRECAUTIONS EXCERPT: * Imaging errors have been reported; blood PSA levels < 2 ng/mL have been associated with poor imaging performance ( 5.1 ). * Allergic reactions: have emergency resuscitation equipment and personnel readily available ( 5.2 ). * Radiation risk: Choline C 11 Injection contributes to a patient's long-term cumulative radiation exposure. Ensure safe handling to protect the patient and health care worker ( 5.3 ). 5.1 Imaging Errors Imaging errors have been reported with 11 C-choline PET and PET/CT imaging. A negative image does not rule out the presence of recurrent prostate cancer and a positive image does not confirm the presence of recurrent cancer. 11 C-choline uptake is not specific for prostate cancer and may occur with other types of cancer (such as lung carcinoma and brain tumors). Clinical correlation, including histopathological evaluation of the suspected recurrence site, is essential to proper use of the PET imaging information. * Blood PSA levels < 2 ng/mL have been associated with poor performance of 11 C-choline PET imaging (higher numbers of false positive and false negative results) [ see Clinical Studies (14) ]. * Tissue inflammation as well as prostatic hyperplasia have been associated with false positive 11 C-choline PET images. * Concomitant colchicine or androgen-deprivation therapeutic drugs (such as luteinizing hormone-releasing analogs and anti-androgen drugs) may interfere with 11 C-choline PET imaging. One published report of 18 F-methylcholine PET imaging indicated that discontinuation of colchicine for two weeks resolved the colchicine effect. The impact of discontinuation of androgen-deprivation therapy upon 11 C-choline PET imaging has not been established [ see Drug Interactions (7) ]. 5.2 Allergic Reactions As with any injectable drug product, allergic reactions and anaphylaxis may occur. Emergency resuscitation equipment and personnel should be immediately available. 5.3 Radiation Risks Choline C 11 Injection contributes to a patient's overall long-term cumulative radiation exposure. Long-term cumulative radiation exposure is associated with an increased risk for cancer. Safe handling should be ensured to minimize radiation exposure to the patient and health care workers [ see Dosage and Administration ( 2. 1 ) ].</Section> </Text> <Mentions> <Mention id="M1" section="S1" type="Severity" start="68" len="4" str="mild" /> <Mention id="M2" section="S1" type="AdverseReaction" start="73" len="23" str="injection site reaction" /> <Mention id="M3" section="S1" type="Severity" start="200" len="4" str="mild" /> <Mention id="M4" section="S1" type="AdverseReaction" start="205" len="23" str="injection site reaction" /> <Mention id="M5" section="S2" type="AdverseReaction" start="193" len="18" str="Allergic reactions" /> <Mention id="M6" section="S2" type="AdverseReaction" start="300" len="14" str="Radiation risk" /> <Mention id="M7" section="S2" type="AdverseReaction" start="1964" len="18" str="allergic reactions" /> <Mention id="M8" section="S2" type="AdverseReaction" start="1987" len="11" str="anaphylaxis" /> <Mention id="M9" section="S2" type="Factor" start="1999" len="3" str="may" /> <Mention id="M10" section="S2" type="Factor" start="2309" len="4" str="risk" /> <Mention id="M11" section="S2" type="AdverseReaction" start="2318" len="6" str="cancer" /> </Mentions> <Relations> <Relation id="R1" type="Effect" arg1="M2" arg2="M1" /> <Relation id="R2" type="Effect" arg1="M4" arg2="M3" /> <Relation id="R3" type="Hypothetical" arg1="M7" arg2="M9" /> <Relation id="R4" type="Hypothetical" arg1="M8" arg2="M9" /> <Relation id="R5" type="Hypothetical" arg1="M11" arg2="M10" /> </Relations> <Reactions> <Reaction id="R1" str="injection site reaction"> <Normalization id="R1.N1" meddra_pt="Injection site reaction" meddra_pt_id="10022095" /> </Reaction> <Reaction id="R2" str="allergic reactions"> <Normalization id="R2.N1" meddra_pt="Hypersensitivity" meddra_pt_id="10020751" meddra_llt="Allergic reaction" meddra_llt_id="10001718" /> </Reaction> <Reaction id="R3" str="anaphylaxis"> <Normalization id="R3.N1" meddra_pt="Anaphylactic reaction" meddra_pt_id="10002198" meddra_llt="Anaphylaxis" meddra_llt_id="10002218" /> </Reaction> <Reaction id="R4" str="cancer"> <Normalization id="R4.N1" meddra_pt="Neoplasm malignant" meddra_pt_id="10028997" meddra_llt="Cancer" meddra_llt_id="10007050" /> </Reaction> <Reaction id="R5" str="radiation risk"> <Normalization id="R5.N1" meddra_pt="Exposure to radiation" meddra_pt_id="10073306" /> </Reaction> </Reactions> </Label>
The <Text> element contains one or more <Section> elements, corresponding to the sections of interest in the drug labels as defined by the FDA (the three sections of interest are "Adverse Reactions", "Warnings and Precautions", and "Boxed Warnings"). Within each <Section> is the text of that section as extracted from the DailyMed label XML. Note that the XML extraction is not perfect. In addition to the obvious whitespace issues seen above, there are also merged words, more information about which will be provided.
The <Mentions> element contains all the mention annotations, corresponding to Task 1. Each <Mention> has an id; a section that indicates which of the <Section> elements the text can be found in; a type; a start offset as well as character len that correspond to the <Section> it is in; and a str value for readability and for sanity checking offsets. When generating the XML output for Task 1, the section, type, start, and len will all be evaluated. The id can be anything, but will be necessary for Task 2. The str is not necessary and will be ignored.
The <Relations> element contains all the relation annotations, corresponding to Task 2. Each <Relation> has an id; a type; and an arg1 and arg2 that correspond to the <Mention> id. Relations can only exist between mentions in the same section. When generating the XML output for Task 2, the type, arg1, and arg2 will all be evaluated. Again, the identifiers can be anything, but they must correspond to a correct <Mention> (as defined above).
The <Reactions> element contains the unique adverse reactions, corresponding to Tasks 3 and 4. Each <Reaction> has an id and a str value. There is only one <Reaction> for each case-insensitive AdverseReaction in the label. This is the focus of Task 3. Each <Reaction> has at least one <Normalization>, which corresponds to an entry in MedDRA 18.1. This is the focus of Task 4. Each <Normalization> has one required and up to five optional attributes. The only required attribute is the id, which is not evaluated. The five optional attributes are meddra_pt, meddra_pt_id, meddra_llt, meddra_llt_id, and flag. The meddra_pt and meddra_pt_id indicate the MedDRA Preferred Term ("PT"). The meddra_llt and meddra_llt_id indicate the MedDRA Lowest Level Term ("LLT"), which uniquely maps to a single MedDRA PT. See the MedDRA website for more details on the MedDRA hierarchy. Finally, the flag indicates special situations where there is not an ideal mapping to a PT: "underspecified" means the reaction is too specific to map; "HLGT" means the reaction maps to a MedDRA High Level Group Term; "HLT" means the reaction maps to a High Level Term; and "unmapped" means there is no MedDRA mapping at all. The "underspecified" reactions will be normalized to a PT (e.g., "hyperpigmentation on the palms" is normalized to the PT "Skin hyperpigmentation"). The "HLGT" reactions (e.g. "vision disorders") and "HLT" reactions (e.g., "taste disorder") will be normalized to the corresponding term, with the HLT/HLGT and its ID used in place of the meddra_pt and meddra_pt_id in the XML. The "unmapped" reactions, however, will have no meddra_pt, meddra_pt_id, meddra_llt, or meddra_llt_id.
The following datasets are available for immediate download:
Combined, these represent a significant core study set of labels of interest to the FDA. In reality, there are nearly 100k labels, which includes over-the-counter drugs and similar labels for the same, or nearly the same, drug (each manufacturer produces their own label, and the same manufacturer might produce multiple labels for different dosages and forms of the same drug ingredient). Given that the set of all possible drug labels is finite, and publicly accessible, we have decided to release the test data as part of this set of unannotated data, with one caveat. We will first note, however, that the test data is only a small portion of the unannotated data, so "cheating" will have little value. The caveat is that participants should not utilize any other drug labels when developing their systems. If you intend to use a large set of unannotated labels for manual tasks (e.g., reading labels for increased understanding), unsuperivsed tasks (e.g., building word embeddings or clustering), or semi-supervised tasks (e.g., self-training), then all of those activities should be performed on the provided set of unannotated labels (as well as the training data, where appropriate). This will allow for better system comparison, as all approaches incorporating unannotated data will be working on the same set of data.
The evaluation measures are:
Precision/Recall/F1-measure on mention-level annotations, with and without mention types.
Primary Metric: micro-averaged F1.
Precision/Recall/F1-measure on relations, with and without relation types.
Primary Metric: micro-averaged F1.
Precision/Recall/F1-measure on unique positive AdverseReaction strings.
Primary Metric: macro-averaged F1 (by label).
Precision/Recall/F1-measure on unique MedDRA LLTs and PTs.
Primary Metric: macro-averaged F1 of PTs (by label).
The official evaluation script will be used to calculate these scores.
Participants will submit system results on the entire set of unannotated labels. Only the labels within the hidden test set will be used for evaluation. As a result, there is no need for a specific 2-3 days where participants freeze their system, download the test data, run it, then submit, as is typical in NLP tasks. A additional advantage of releasing all the data in an unannotated form is that participants can submit results at any time.
Participants are allowed three separate submissions. Submissions that do not conform to the provided XML standards will be rejected without consideration or notification.
|May 2017||Registration deadline for participants.|
|26 September 2017||Participant submissions due.|
|Early October 2017||Individual results sent to participants.|
|15 October 2017||Short system descriptions and workshop presentation proposals due.|
|20 October 2017||Notification of acceptance of workshop presentation proposals.|
|1 November 2017||Participant workshop notebook papers due.|
|13-14 November 2017||TAC 2017 Workshop in Gaitherburg, MD, USA.|
|February 2018||Final proceedings papers due.|
Kirk Roberts (firstname.lastname@example.org)
Dina Demner-Fushman (email@example.com)
Joseph Tonning (firstname.lastname@example.org)