NLP resources at NLM

 
 This page provides access to data collections created to support research in consumer-health question answering, extraction of adverse drug reactions, extraction of information from MEDLINE®/PubMed® citations, and many other Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine (NLM) projects.
Resources created for the Indexing Initiative projects can be found in https://ii.nlm.nih.gov/DataSets/ Global resources are listed in the Online Registry of Biomedical Informatics Tools (ORBIT)
 

 

Collection Description Created
Persistent PubMed Abstracts for BioNLP Research A static online collection of MEDLINE®/PubMed® citations consisting of titles and abstracts (when available) for articles included in the MEDLINE database in a given year. September 2016
CHQA-Corpus-1.0 A collection of 2,614 consumer health questions annotated with named entities, question topic, question triggers, and question frames. August 2017
SPL-ADR-200db A collection of 200 Structured Product Labels fully annotated with adverse drug reactions (ADRs), and a database of distinct ADRs for each of the sections designated to report ADRs for each of the 200 drugs. April 2017
PlacentaCollection A collection of MEDLINE abstracts fully annotated with gene-disease relationships and gene and protein activity associated with placenta-mediated diseases. April 2017
VQA 2018collection(ImageCLEF) A collection of Medical images and Visual Question-Answer pairs for ImageCLEF 2018 evaluation. April 2017
BART fine-tuned checkpoint The Bidirectional Autoregressive Transformer (BART) model fine-tuned on BioASQ data for single-document, question-driven summarization. Feb 2020
MedVidQA and MedVidCL Video Features Video features of MedVidQA and MedVidCL datasets extracted using pretrained I3D and ViT models. Jan 2022
MedVidQA at TRECVID 2023 Video Features Video features for videos released under MedVidQA at TRECVID 2023 extracted using I3D model. June 2023
OpenI-Images (Train) A collection of OpenI images (training) extracted from Open-I (https://openi.nlm.nih.gov/) used in this work (https://arxiv.org/pdf/2210.02401.pdf). July 2023
OpenI-Images (Test) A collection of OpenI images (test) extracted from Open-I (https://openi.nlm.nih.gov/) used in this work (https://arxiv.org/pdf/2210.02401.pdf). July 2023
OpenI-ResNet (Train) Image features of OpenI datasets (training) extracted using ResNet-50 model. July 2023
OpenI-ResNet (Test) Image features of OpenI datasets (test) extracted using ResNet-50 model. July 2023
OpenI-ConvNeXt (Train) Image features of OpenI datasets (training) extracted using ConvNeXt-L model. July 2023
OpenI-ConvNeXt (Test) Image features of OpenI datasets (test) extracted using ConvNeXt-L model. July 2023
HealthVidQA A collection of video question-answering datasets annotated with healthcare questions and visual answers from instructional videos. March 2024