TAC 2018

Drug-Drug Interaction Extraction from Drug Labels


The U.S. Food and Drug Administration (FDA) is responsible for protecting the public health by assuring the safety, efficacy, and security of all FDA-regulated products, including human and veterinary drugs, prescription and over-the-counter pharmaceutical drugs, vaccines, biopharmaceuticals, blood transfusions, and biological products, among others. FDA and the National Library of Medicine (NLM) have been working together on transforming the content of Structured Product Labeling (SPL) documents for prescription drugs into discrete, coded, computer-readable data that will be made available to the public in individual SPL index documents. Transforming the narrative text to structured information encoded in national standard terminologies is a prerequisite to the effective deployment of drug safety information. Being able to electronically access labeling information and to search and sort that information is an important step toward the creation of a fully automated health information exchange system. TAC 2017 addressed one of the important drug safety issues: automated extraction of adverse drug reactions reported in SPLs. An equally important and complex task is automated extraction of drug-drug interaction information. Drug-drug interactions can lead to a variety of adverse events, and it has been suggested that preventable adverse events are the eighth leading cause of death in the United States.

The results of this track will inform future FDA efforts at automating important safety processes, and could potentially lead to future FDA collaboration with interested researchers in this area.


The purpose of this TAC track is to test various natural language processing (NLP) approaches for their information extraction (IE) performance on drug-drug interactions in SPLs. A set of 20 gold-standard SPLs annotated with drug-drug interactions will be provided to participants. An additional set of 180 SPLs annotated in slightly different format is available for training. Participants will be evaluated by their performance on a held-out set of 50 labeled SPLs. For more information about TAC, please visit https://tac.nist.gov/2018/index.html


The participants are provided with an official training set containing 22 drug labels in XML format (Training-22). These labels contain gold standard annotations created by NLM and FDA. An additional set of at least 50 drug labels will be provided as the official test set in the same format. The annotations in the 22 drug labels in the training set were generated semi-automatically and might be missing some interactions. The automatically extracted entities and relations in these sentences were manually corrected by the FDA experts and NLM volunteers using these guidelines (schematic presentation courtesy of Mark Sharp.) Additional 180 labels are fully manually annotated by NLM (NLM-180) in a comparable format described here: https://lhce-brat.nlm.nih.gov/NLMDDICorpus.htm. The below descriptions provide the mappings between the two formats.

Entity Annotations

The gold standard contains the following entity-style annotations:

Precipitant: A substance interacting with the Labeled Drug could be another drug, a drug class or a non-drug substance (e.g., alcohol, grapefruit juice).

Trigger: Trigger word or phrase for an interaction event.

SpecificInteraction: Results of interactions, e.g., severe hyperkalemia.

Relation Annotations

The following relations connect an Interaction with one of the above entities. Each relation is limited to a specific subset of entity types.

Pharmacokinetic interactions are tagged as Increase or Decrease interactions in the NLM-180 files. In the official-20, they are indicated by Triggers, e.g., reducing diuretic absorption and other phrases indicating increases / decreases in function measurements.

Pharmacodynamic interactions are tagged as Specific Interaction in the NLM-180 files. In the official-20, they are indicated by Triggers, e.g., potentiate or increased risks and SpecificInteraction.

Unspecified interactions are tagged as Caution Interaction in the NLM-180 files and indicated by Triggers, e.g., avoid use, in the official-20 files.

Interaction listing and mappings

The ultimate aim is to know which interactions are in the labels, not the precise offsets or relations, such that the interactions may be linked to structured knowledge sources. Further, an interaction mentioned several times should not necessarily carry more weight than an interaction mentioned once. As such, the gold standard contains a list of unique interaction aggregated at the document level. These interactions are mapped as follows:

Data Access

The following datasets are available for immediate download:


The participants may choose any one specific task described below or approach the tasks as each one building upon the previous tasks. Some tasks do necessarily require the output of previous tasks, e.g., Task 2 requires Task 1.

Task 1: Extract Mentions of Interacting Drugs/Substances, interaction triggers and specific interactions at sentence level. This is similar to many NLP named entity recognition (NER) evaluations.

Task 2: Identify interactions at sentence level, including: the interacting drugs, the specific interaction types: pharmacokinetic, pharmacodynamic or unspecified, and the outcomes of pharmacokinetic and pharmacodynamic interactions. This is similar to many NLP relation identification evaluations.

Task 3: Generate a global list of distinct interactions for the label in normalized form.

Task 4: Normalization task. The interacting substance should be normalized to UNII, and the drug classes to NDF-RT NUI. Normalize the consequence of the interaction to SNOMED CT if it is a medical condition. Normalize pharmacokinetic effects to National Cancer Institute Thesaurus codes.

Any resources, e.g., the UMLS Terminology Services, may be used to aid with the normalization process.


An annotated test set of at least 50 labels in XML format will be used to evaluate performance. The XML schema is available in xsd and dtd The participants will be asked to submit the results on ALL test set labels in XML format.

The evaluation measures are:

Task 1

Precision/Recall/F1-measure on entity-level annotations, using both partial and exact matching.

Primary Metric: micro-averaged F1 on exact matches.

Task 2

Precision/Recall/F1-measure on relations.

Primary Metric: micro-averaged F1.

Task 3

Precision/Recall/F1-measure on unique Interactions.

Primary Metric: macro-averaged F1 (by label)

Task 4

Precision/Recall/F1-measure on unique Interactions.

Primary Metric: macro-averaged F1 (by label)


The official evaluation script will be used to calculate these scores.


Participants are allowed three separate submissions. Submissions that do not conform to the provided XML standards will be rejected without consideration.


May 2018 Training set release.
July 15, 2018 Registration deadline for participants.
August 2018 Test set release.
September 2018 Participants' submissions due.
Early-Mid October 2018 Individual results sent to participants.
October 15, 2018 Short system descriptions and workshop presentation proposals due.
October 20, 2018 Notification of acceptance of workshop presentation proposals.
November 2018 Participant workshop notebook papers due.
November 13-14, 2018 TAC 2017 Workshop in Gaitherburg, MD, USA.
February 15, 2018 Final proceedings papers due.

Mailing List

tac-adr@googlegroups.com Note: we are keeping the ADR mailing list for All Drug [Label] Related Evaluations.


Dina Demner-Fushman (ddemner@mail.nih.gov)
Kin Wah Fung, NLM
Phong Do, Office of Health Informatics, U.S. Food and Drug Administration
Richard D Boyce, University of Pittsburgh