Presenter: Marcin P. Joachimiak, PhD
Computational Biologist, Environmental Genomics and Systems Biology Division, Biosystems Data Science Department, Lawrence Berkeley National Laboratory
Title: "How to Teach a Computer to Learn about Microbes"
Abstract: Microorganisms (microbes) are incredibly diverse, spanning all major divisions of life, and represent the greatest fraction of known species. A vast amount of knowledge about microbes is available in the literature, across experimental datasets, and in established data resources. While the genomic and biochemical pathway data about microbes is well-structured and annotated using standard ontologies, broader information about microbes and their ecological traits is not. Our goal is to support tasks such as graph querying and link prediction in diverse use cases across microbiology, biomedicine, and the environment. To this end we are creating knowledge graphs (KGs) focused on microbes by semantically integrating diverse knowledge from a variety of structured and unstructured sources. On the example of KG-Microbe (https://github.com/Knowledge-Graph-Hub/kg-microbe), we harmonized and linked prokaryotic data for phenotypic traits, taxonomy, functions, chemicals, and environment descriptors, to construct a knowledge graph with over 266,000 entities linked by 432,000 relations. These efforts are supported by a knowledge graph construction platform (KG-Hub) for rapid development of KGs using available data, knowledge modeling principles, and software tools. Finally, as part of a larger collaboration, we developed new graph processing and learning methods, allowing to scalably generate high quality embeddings for very large graphs. We demonstrate the utility of our KG construction and learning approach by predicting microbial shape and other features using classifiers trained on microbial knowledge graph embeddings.
Learn more about the Informatics Institute PowerTalk Seminar Series
Friday, January 14 at 10:00am to 11:00amVirtual Event