NLP resources at NLM

 
 This page provides access to data collections created to support research in consumer-health question answering, extraction of adverse drug reactions, extraction of information from MEDLINE®/PubMed® citations, and many other Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine (NLM) projects.
Resources created for the Indexing Initiative projects can be found in https://ii.nlm.nih.gov/DataSets/ Global resources are listed in the Online Registry of Biomedical Informatics Tools (ORBIT)
 

 

Collection Description Created
Persistent PubMed Abstracts for BioNLP Research A static online collection of MEDLINE®/PubMed® citations consisting of titles and abstracts (when available) for articles included in the MEDLINE database in a given year. September 2016
CHQA-Corpus-1.0 A collection of 2,614 consumer health questions annotated with named entities, question topic, question triggers, and question frames. August 2017
SPL-ADR-200db A collection of 200 Structured Product Labels fully annotated with adverse drug reactions (ADRs), and a database of distinct ADRs for each of the sections designated to report ADRs for each of the 200 drugs. April 2017
PlacentaCollection A collection of MEDLINE abstracts fully annotated with gene-disease relationships and gene and protein activity associated with placenta-mediated diseases. April 2017
VQA 2018collection(ImageCLEF) A collection of Medical images and Visual Question-Answer pairs for ImageCLEF 2018 evaluation. April 2017
BART fine-tuned checkpoint The Bidirectional Autoregressive Transformer (BART) model fine-tuned on BioASQ data for single-document, question-driven summarization. Feb 2020
MedVidQA and MedVidCL Video Features Video features of MedVidQA and MedVidCL datasets extracted using pretrained I3D and ViT models. Jan 2022