NLP resources at NLM |
||
This page provides access to data collections created to support research in consumer-health question answering,
extraction of adverse drug reactions, extraction of information from MEDLINE®/PubMed® citations,
and many other Lister Hill National Center for Biomedical Communications,
U.S. National Library of Medicine (NLM) projects. Resources created for the Indexing Initiative projects can be found in https://ii.nlm.nih.gov/DataSets/ Global resources are listed in the Online Registry of Biomedical Informatics Tools (ORBIT) |
Collection | Description | Created |
Persistent PubMed Abstracts for BioNLP Research | A static online collection of MEDLINE®/PubMed® citations consisting of titles and abstracts (when available) for articles included in the MEDLINE database in a given year. | September 2016 |
CHQA-Corpus-1.0 | A collection of 2,614 consumer health questions annotated with named entities, question topic, question triggers, and question frames. | August 2017 |
IOWA collection | Clinical questions described in Ely JW, Osheroff JA, Ebell MH, Bergus GR, Levy BT, Chambliss ML, et al. Analysis of questions asked by family doctors regarding patient care. BMJ 1999;319:358-361. | 1999 |
SPL-ADR-200db | A collection of 200 Structured Product Labels fully annotated with adverse drug reactions (ADRs), and a database of distinct ADRs for each of the sections designated to report ADRs for each of the 200 drugs. | April 2017 |
PlacentaCollection | A collection of MEDLINE abstracts fully annotated with gene-disease relationships and gene and protein activity associated with placenta-mediated diseases. | April 2017 |
VQA 2018collection(ImageCLEF) | A collection of Medical images and Visual Question-Answer pairs for ImageCLEF 2018 evaluation. | April 2017 |
BART fine-tuned checkpoint | The Bidirectional Autoregressive Transformer (BART) model fine-tuned on BioASQ data for single-document, question-driven summarization. | Feb 2020 |
MedVidQA and MedVidCL Video Features | Video features of MedVidQA and MedVidCL datasets extracted using pretrained I3D and ViT models. | Jan 2022 |
MedVidQA at TRECVID 2023 Video Features | Video features for videos released under MedVidQA at TRECVID 2023 extracted using I3D model. | June 2023 |
OpenI-Images (Train) | A collection of OpenI images (training) extracted from Open-I (https://openi.nlm.nih.gov/) used in this work (https://arxiv.org/pdf/2210.02401.pdf). | July 2023 |
OpenI-Images (Test) | A collection of OpenI images (test) extracted from Open-I (https://openi.nlm.nih.gov/) used in this work (https://arxiv.org/pdf/2210.02401.pdf). | July 2023 |
OpenI-ResNet (Train) | Image features of OpenI datasets (training) extracted using ResNet-50 model. | July 2023 |
OpenI-ResNet (Test) | Image features of OpenI datasets (test) extracted using ResNet-50 model. | July 2023 |
OpenI-ConvNeXt (Train) | Image features of OpenI datasets (training) extracted using ConvNeXt-L model. | July 2023 |
OpenI-ConvNeXt (Test) | Image features of OpenI datasets (test) extracted using ConvNeXt-L model. | July 2023 |
HealthVidQA | A collection of video question-answering datasets annotated with healthcare questions and visual answers from instructional videos. | March 2024 |
HealthVer | HEALTHVER is an evidence-based fact-checking dataset for verifying the veracity of real-world claims about COVID-19 against scientific articles. https://aclanthology.org/2021.findings-emnlp.297.pdf | September 2021 |