NLP resources at NLM |
||
This page provides access to data collections created to support research in consumer-health question answering,
extraction of adverse drug reactions, extraction of information from MEDLINE®/PubMed® citations,
and many other Lister Hill National Center for Biomedical Communications,
U.S. National Library of Medicine (NLM) projects. Resources created for the Indexing Initiative projects can be found in https://ii.nlm.nih.gov/DataSets/ Global resources are listed in the Online Registry of Biomedical Informatics Tools (ORBIT) |
Collection | Description | Created |
Persistent PubMed Abstracts for BioNLP Research | A static online collection of MEDLINE®/PubMed® citations consisting of titles and abstracts (when available) for articles included in the MEDLINE database in a given year. | September 2016 |
CHQA-Corpus-1.0 | A collection of 2,614 consumer health questions annotated with named entities, question topic, question triggers, and question frames. | August 2017 |
SPL-ADR-200db | A collection of 200 Structured Product Labels fully annotated with adverse drug reactions (ADRs), and a database of distinct ADRs for each of the sections designated to report ADRs for each of the 200 drugs. | April 2017 |
PlacentaCollection | A collection of MEDLINE abstracts fully annotated with gene-disease relationships and gene and protein activity associated with placenta-mediated diseases. | April 2017 |
VQA 2018collection(ImageCLEF) | A collection of Medical images and Visual Question-Answer pairs for ImageCLEF 2018 evaluation. | April 2017 |
BART fine-tuned checkpoint | The Bidirectional Autoregressive Transformer (BART) model fine-tuned on BioASQ data for single-document, question-driven summarization. | Feb 2020 |
MedVidQA and MedVidCL Video Features | Video features of MedVidQA and MedVidCL datasets extracted using pretrained I3D and ViT models. | Jan 2022 |