Health literacy, or the ability of people to transform health-related information into informed actions, accounts for measurable impact on outcomes, and its deficits help to drive the disparities of health that are linked to education level (McCray 2004, Berckman 2011). Though the Internet and open science initiatives have placed unprecedented amounts of biomedical knowledge at the fingertips of medical practitioners and medical consumers alike, consumers often run into a language barrier, even if the resources are in their native language.

Advances in Deep Learning may soon make it feasible to automatically and accurately adapt difficult scientific text for patients and caregivers. However, significant barriers need to be overcome for such models to be viable for production (Ondov 2022). It is our hope that this track will stimulate research in automatic plain language adaptation of biomedical text resources to help improve health literacy and engagement among patients and caregivers.


The goal of the PLABA track is to improve health literacy by adapting biomedical abstracts for the general public using plain language. When adapting, source sentences may be split, in which case the output for one source sentence will be multiple target sentences, or omitted, in which case the output for a source sentence will be blank. However, source sentences may not be merged, and the output for a given source sentence should not contain information from other source sentences. Both source and output will be in English. An example of adaptation is below.

Input (question)
how is strep throat treated?
Input (abstract)
Output (adapted)
Acute pharyngitis/tonsillitis, which is characterized by inflammation of the posterior pharynx and tonsils, is a common disease. Sore throat/tonsillitis, or when the back of the throat or tonsils is inflamed, is common.
Several viruses and bacteria can cause acute pharyngitis; however, Streptococcus pyogenes (also known as Lancefield group A β-hemolytic streptococci) is the only agent that requires an etiologic diagnosis and specific treatment. Many viruses and bacteria can cause short-term sore throat. However, group A strep, caused by Group A strep bacteria, is the only cause that must be identified based on signs and symptoms and treated.
S. pyogenes is of major clinical importance because it can trigger post-infection systemic complications, acute rheumatic fever, and post-streptococcal glomerulonephritis. Group A strep bacteria are important to identify because they can cause post-strep throat complications throughout the body, acute rheumatic fever (a disease that inflames the body's tissues), and post-strep throat kidney disease.
Symptom onset in streptococcal infection is usually abrupt and includes intense sore throat, fever, chills, malaise, headache, tender enlarged anterior cervical lymph nodes, and pharyngeal or tonsillar exudate. Strep throat symptoms usually happen quickly and include severe sore throat, fever, chills, general discomfort, headache, swollen lymph nodes in the front of the neck, and white or yellow spots on the throat or tonsils.
Cough, coryza, conjunctivitis, and diarrhea are uncommon, and their presence suggests a viral cause. Cough, cold symptoms, pink eye, and diarrhea are not common and might be caused by a virus.
A diagnosis of pharyngitis is supported by the patient's history and by the physical examination. Learning the person's history and doing a physical exam are used to diagnose strep throat.
Throat culture is the gold standard for diagnosing streptococcus pharyngitis. A throat swab to find, grow, and test bacteria in the throat that make you sick is the best way to diagnose strep throat.
However, it has been underused in public health services because of its low availability and because of the 1- to 2-day delay in obtaining results. However, it has not been used as much as it should because it is not widely available and takes 1 to 2 days to get results.
Rapid antigen detection tests have been used to detect S. pyogenes directly from throat swabs within minutes. Rapid strep tests have been used to find fragments of bacteria that cause strep throat from swabs within minutes.
Clinical scoring systems have been developed to predict the risk of S. pyogenes infection. Scoring systems have been made to predict the risk of strep throat.
The most commonly used scoring system is the modified Centor score.
Acute S. pyogenes pharyngitis is often a self-limiting disease. Short-term strep throat often goes away on its own without treatment.
Penicillins are the first-choice treatment. Penicillins, a type of antibiotics, are prescribed most commonly.
For patients with penicillin allergy, cephalosporins can be an acceptable alternative, although primary hypersensitivity to cephalosporins can occur. For people allergic to penicillin, cephalosporins, another type of antibiotics, can be prescribed, although people can be allergic to cephalosporins.
Another drug option is the macrolides. Another drug option is macrolides, another type of antibiotics.
Future perspectives to prevent streptococcal pharyngitis and post-infection systemic complications include the development of an anti-Streptococcus pyogenes vaccine. Making an anti-strep throat vaccine could be one way to prevent strep throat and post-strep throat complications throughout the body in the future.

Source abstracts have been retrieved to answer consumer questions asked on MedlinePlus. These questions will be used to guide manual evaluation (see Evaluation). Teams will have access to the questions and may provide Systems with them if desired.

There will be two tasks. Teams may choose to participate in either or both.

Task 1 - Term Replacement

Task 1 will not require complete adaptation. Rather, your system will identify difficult terms, decide how to handle them, and provide substitutions or explanations.

Task 2 - Complete Abstract Adaptation

Task 2 is to end-to-end adapt biomedical abstracts for the general public using plain language. Given a set of abstracts (the source), your system will provide output for each sentence of the source.

Input and Output


Training data for Task 1 will be released soon.

For training data for Task 2, teams can utilize the publicly available PLABA dataset (Attal 2023), which comprises 750 abstracts, each manually adapted to plain language by at least one annotator, for a total of 7,643 sentence pairs.

Complete guidelines given to annotators can be seen here.

Notable guidelines include:

data.json (3mb) Readme.pdf


Task 1

Tasks 1A and will be evaulated with macro averaged 3-class (BIO) F1 vs. gold standard annotations for the test set. Task 2 will be evaluated with micro averaged F1 vs. multilabel gold standards (binary for each of the 4 labels). Task 1C will be evaluated via BERTscore to gold standard replacement strings.

Task 2

Due to the high stakes of the biomedical domain, it is important to evaluate system outputs manually. Experts will rank system output, for a sampling of abstracts, based on these axes:







Mailing List

The mailing list for this track is Participants may join the mailing list by joining the PLABA 2024 Google Group. A Google account (not necessarily a Gmail account) is required to join the group. Group members will receive messages that are sent to the group mailing list. Messages to the mailing list should be sent to


Brian Ondov & Dina Demner-Fushman
U.S. National Library of Medicine
Hoa T. Dang
National Institute for Standards and Technology