Background
Health literacy, or the ability of people to transform health-related information into informed actions, accounts for measurable impact on outcomes, and its deficits help to drive the disparities of health that are linked to education level (McCray 2004, Berckman 2011). Though the Internet and open science initiatives have placed unprecedented amounts of biomedical knowledge at the fingertips of medical practitioners and medical consumers alike, consumers often run into a language barrier, even if the resources are in their native language.
Advances in Deep Learning may soon make it feasible to automatically and accurately adapt difficult scientific text for patients and caregivers. However, significant barriers need to be overcome for such models to be viable for production (Ondov 2022). It is our hope that this track will stimulate research in automatic plain language adaptation of biomedical text resources to help improve health literacy and engagement among patients and caregivers.
Task
The goal of the PLABA track is to improve health literacy by adapting biomedical abstracts for the general public using plain language. When adapting, source sentences may be split, in which case the output for one source sentence will be multiple target sentences. However, source sentences may not be merged, and the output for a given source sentence should not contain information from other source sentences. Both source and output will be in English. An example of adaptation is below.
Input (question)how is strep throat treated? | |
Input (abstract) | Output (adapted) |
Acute pharyngitis/tonsillitis, which is characterized by inflammation of the posterior pharynx and tonsils, is a common disease. | Sore throat/tonsillitis, or when the back of the throat or tonsils is inflamed, is common. |
Several viruses and bacteria can cause acute pharyngitis; however, Streptococcus pyogenes (also known as Lancefield group A β-hemolytic streptococci) is the only agent that requires an etiologic diagnosis and specific treatment. | Many viruses and bacteria can cause short-term sore throat. However, group A strep, caused by Group A strep bacteria, is the only cause that must be identified based on signs and symptoms and treated. |
S. pyogenes is of major clinical importance because it can trigger post-infection systemic complications, acute rheumatic fever, and post-streptococcal glomerulonephritis. | Group A strep bacteria are important to identify because they can cause post-strep throat complications throughout the body, acute rheumatic fever (a disease that inflames the body's tissues), and post-strep throat kidney disease. |
Symptom onset in streptococcal infection is usually abrupt and includes intense sore throat, fever, chills, malaise, headache, tender enlarged anterior cervical lymph nodes, and pharyngeal or tonsillar exudate. | Strep throat symptoms usually happen quickly and include severe sore throat, fever, chills, general discomfort, headache, swollen lymph nodes in the front of the neck, and white or yellow spots on the throat or tonsils. |
Cough, coryza, conjunctivitis, and diarrhea are uncommon, and their presence suggests a viral cause. | Cough, cold symptoms, pink eye, and diarrhea are not common and might be caused by a virus. |
A diagnosis of pharyngitis is supported by the patient's history and by the physical examination. | Learning the person's history and doing a physical exam are used to diagnose strep throat. |
Throat culture is the gold standard for diagnosing streptococcus pharyngitis. | A throat swab to find, grow, and test bacteria in the throat that make you sick is the best way to diagnose strep throat. |
However, it has been underused in public health services because of its low availability and because of the 1- to 2-day delay in obtaining results. | However, it has not been used as much as it should because it is not widely available and takes 1 to 2 days to get results. |
Rapid antigen detection tests have been used to detect S. pyogenes directly from throat swabs within minutes. | Rapid strep tests have been used to find fragments of bacteria that cause strep throat from swabs within minutes. |
Clinical scoring systems have been developed to predict the risk of S. pyogenes infection. | Scoring systems have been made to predict the risk of strep throat. |
The most commonly used scoring system is the modified Centor score. | |
Acute S. pyogenes pharyngitis is often a self-limiting disease. | Short-term strep throat often goes away on its own without treatment. |
Penicillins are the first-choice treatment. | Penicillins, a type of antibiotics, are prescribed most commonly. |
For patients with penicillin allergy, cephalosporins can be an acceptable alternative, although primary hypersensitivity to cephalosporins can occur. | For people allergic to penicillin, cephalosporins, another type of antibiotics, can be prescribed, although people can be allergic to cephalosporins. |
Another drug option is the macrolides. | Another drug option is macrolides, another type of antibiotics. |
Future perspectives to prevent streptococcal pharyngitis and post-infection systemic complications include the development of an anti-Streptococcus pyogenes vaccine. | Making an anti-strep throat vaccine could be one way to prevent strep throat and post-strep throat complications throughout the body in the future. |
Source abstracts have been retrieved to answer consumer questions asked on MedlinePlus. Teams will have access to the questions and may provide Systems with them if desired.
There will be two tasks. Teams may choose to participate in either or both.
Task 1 - Term Replacement
Task 1 will not require complete adaptation. Rather, your system will identify difficult terms, decide how to handle them, and provide replacements.
- Task 1A - Identifying non-consumer terms: Given an abstract, return a list of exact strings from the text, each representing a concept a consumer would not understand.
-
Task 1B - Classifying replacement: For each identified non-consumer
term, determine whether the term could be (non-exclusively):
- Substituted: the term is jargon with a common alternative (e.g. "myocardial infarction" can be "heart attack").
-
Explained: there is no alternative, or the term is important to the topic, and it should be explained.
For example:
"This study looked at treatments for sleep apnoea (when you stop breathing while sleeping)." -
Generalized: the term can be replaced with a more general category without losing its significance. For example:
"Clearing of the infection is confirmed with aNucleic Acid Amplification Test (NAAT)common lab test." -
Exemplified: the term has a specific example that would give a general audience an idea of what it is. For example:
"Depression is common in people with neurodegenerative diseases (like Parkinson’s)." - Omitted: The term is not relevant to understanding the sentence or too technical to explain, and does not need to appear in a consumer version.
- Task 1C - Generation: Provide text for each positive label from 1B (except "omitted" label).
Task 2 - Complete Abstract Adaptation
Task 2 is to end-to-end adapt biomedical abstracts for the general public using plain language. Given a set of abstracts (the source), your system will provide output for each sentence of the source.
Input and Output
Test data will be provided in a single JSON file; submissions can be prepared by simply replacing the text of each sentence string.
-
For each of 40 consumer questions:
- Input: A consumer question used to retrieve abstracts
-
For each of 10 abstracts retrieved to answer the consumer question:
-
For each sentence:
- Input: A string for a single source sentence
- Output: A string for the adapted version of the sentence, which may contain multiple sentences (to split it); should not contain newlines
-
For each sentence:
Data
Task 1
Training and test data for Task 1 are available in the TREC active participants area.
Task 2
Test data for Task 2 are available in the TREC active participants area. For training data, teams can utilize the publicly available PLABA dataset (Attal 2023), which comprises 750 abstracts, each manually adapted to plain language by at least one annotator, for a total of 7,643 sentence pairs.Complete guidelines given to annotators can be seen here.
Notable guidelines include:- Split complex sentences. Each source sentence can have any number of target sentences.
- Substitute medical jargon with common alternatives, e.g. orthoses with braces.
- Explain terms with no substitutes when introduced, e.g. "Duloxetine (a common antidepressant)."
Evaluation
Task 1
- Task 1A - F1 of retrieved labels
- Task 1B - Multilabel F1 weighted by three-fold annotator totals
- Task 1C - Manual evaluation
- Simplicity: Outputs should be easy to understand.
- Accuracy: Outputs should contain the accurate information.
- Completeness: Outputs should seek to minimize information lost from the original text.
- Brevity: Outputs should be concise.
Task 2
Due to the high stakes of the biomedical domain, it is important to evaluate system outputs manually. Experts will rank system output, for a sampling of abstracts, based on these axes:
- Simplicity: Outputs should be easy to understand.
- Accuracy: Outputs should contain the accurate information.
- Completeness: Outputs should seek to minimize information lost from the original text.
- Brevity: Outputs should be concise.
Registration & Submission
To register and submit, visit https://ir.nist.gov/evalbase/
Timeline
(revised August 5)
- August 5
- All train and test data released
- August 30
- Submissions due (Task 2)
- September 15
- Submissions due (Task 1)
- September 30
- Judgments returned
- November 18-22
- TREC meeting at NIST in Gaithersburg, MD, USA
Mailing List
The mailing list for this track is plaba2024@googlegroups.com. Participants may join the mailing list by joining the PLABA 2024 Google Group. A Google account (not necessarily a Gmail account) is required to join the group. Group members will receive messages that are sent to the group mailing list. Messages to the mailing list should be sent to plaba2024@googlegroups.com.