Explore UAB

UAB Campus Calendar UAB's Strategic Plan
Sign Up

The next edition of the Department of Biomedical Informatics and Data Science (DBIDS) PowerTalk Series will feature John Osborne, Ph.D., Associate Professor of the UAB Department of Biomedical Informatics and Data Science, providing his presentation entitled "Generation and Application of Synthetic Data for Improved Clinical Text De-identification and Large Language Model Enhanced Disease Entity Linking."

Augmenting data sets with synthetic data has long been used to improve performance in machine learning algorithms. Recent advances in Large Language Model (LLM) capabilities have further enhanced opportunities to generate and apply high-quality synthetic text to problems.

This presentation will discuss the use of both templated and LLM-generated synthetic data, and how it can be used to enhance the protection of patient privacy in the generation of de-identified clinical text and improve the training of information extraction algorithms. Specifically, this talk will discuss optimal surrogate substitution strategies for Personal Health Information (PHI) and examine the impact of these strategies on downstream Natural Language Processing (NLP) tasks. The presentation will also discuss the fine-tuning of a Llama model to generate synthetic data for both Disease Entity Recognition and Normalization.

As a public institution of higher education, UAB adheres to principles of Free Speech, Civil Discourse and Institutional Neutrality. Freedom of Expression and Use of UAB Facilities Policy allows members of the UAB community to hold events on campus; views expressed do not represent the university, which adheres to institutional neutrality. Events held at UAB are required to comply with all applicable state and federal laws, and participation is not limited based on impermissible criteria (e.g., age, gender, race, national origin, sexual orientation).


0 people are interested in this event