Newswise — Oak Ridge National Laboratory scientists used their expertise in quantum biology, artificial intelligence and bioengineering to improve how CRISPR Cas9 genome editing tools work on organisms like microbes that can be modified to produce renewable fuels and chemicals.
CRISPR is a powerful bioengineering tool, used to modify the genetic code to improve an organism’s performance or correct mutations. The Cas9 CRISPR tool relies on a unique, unique guide RNA that directs the Cas9 enzyme to bind and cleave the corresponding targeted site in the genome. Existing models for computationally predicting effective guide RNAs for CRISPR tools have been built on data from only a few model species, with poor and inconsistent effectiveness when applied to microbes.
“Many CRISPR tools have been developed for mammalian cells, fruit flies or other model species. Few have focused on microbes with very different chromosome structures and sizes,” said Carrie Eckert, head of the synthetic biology group at ORNL. “We had observed that design models of the CRISPR Cas9 machinery behave differently when working with microbes, and this research validates what we knew anecdotally.”
To improve guide RNA modeling and design, ORNL scientists sought to better understand what happens at the most basic level in the cell. nuclei, where genetic material is stored. They turned to quantum biology, a field linking molecular biology and quantum chemistry that studies the effects that electronic structure can have on the chemical properties and interactions of nucleotides, the molecules that form the building blocks of DNA and RNA.
How electrons are distributed in the molecule influences reactivity and conformational stability, including the likelihood that the guide RNA-Cas9 enzyme complex will bind efficiently to the microbe’s DNA, said Erica Prates, a computational systems biologist at ORNL.
The best guide through a forest of decisions
Scientists have built an explainable model artificial intelligence model called iterative random forest. They trained the model on a dataset of approximately 50,000 guide RNAs targeting the genome of E.coli bacteria while taking into account quantum chemical properties, in an approach described in the review Nucleic acid research.
The model revealed key features of nucleotides that may enable the selection of better guide RNAs. “The model helped us identify clues to the molecular mechanisms underlying the effectiveness of our guide RNAs,” Prates said, “providing us with a rich library of molecular information that can help us improve CRISPR technology .”
ORNL researchers validated the explainable AI model by conducting CRISPR Cas9 cutting experiments on E.coli with a large group of guides selected by the model.
Use explainable AI gave scientists an understanding of the biological mechanisms that led to the results, rather than a deep learning model anchored in a “black box” algorithm that lacks interpretability, said Jaclyn Noshay, a former computational systems biologist at ORNL and first author of the article.
“We wanted to improve our understanding of guide design rules for optimal cutting efficiency with a focus on microbial species, given the knowledge of incompatibility of patterns formed between (biological) kingdoms,” said Noshay.
The explainable AI The model, with its thousands of features and iterative nature, was trained using the Summit supercomputer at ORNL’s Oak Ridge Leadership Computer Facility, or OLCF, a DOE Office of Science user facility.
Eckert said his synthetic biology team plans to work with colleagues in computer science at ORNL to take what they learned with the new CRISPR Cas9 microbial model and improve it further using data from laboratory experiments or a variety of microbial species.
Better CRISPR Cas9 tools for every species
Taking quantum properties into account opens the door to improvements of the Cas9 guide for each species. “This paper even has implications on a human scale,” Eckert said. “If you’re looking at any type of drug development, for example when you’re using CRISPR to target a specific region of the genome, you need to have the most accurate model to predict these guides.”
Improving CRISPR Cas9 models provides scientists with a higher-throughput pipeline to link genotype to phenotype, or genes to physical traits, an area known as functional. genomics. Research has implications for ORNL-led project work Bioenergy Innovation Center (CBI), for example, to improve bioenergy production plants and bacterial fermentation of biomass.
“We are significantly improving our predictions about guide RNA with this research,” Eckert said. “The better we understand the biological processes at play and the more data we can feed our predictions, the better our targets will be, thus improving the precision and speed of our research. »
“A key goal of our research is to improve the ability to predictively edit the DNA of more organisms using CRISPR tools. “This study represents an exciting step forward toward understanding how we can avoid making costly ‘typos’ in an organism’s genetic code,” said ORNL’s Paul Abraham, a bioanalytical chemist who leads the DOE Genomic Science Program. Secure Ecosystem Engineering and Design Science Focus Area, or SEED SFA, which has supported CRISPR research. “I’m excited to see how much these predictions can improve as we generate additional training data and continue to mine explainable data.” AI modeling.”
Co-authors of the publication included William Alexander, Dawn Klingeman, Erica Prates, Carrie Eckert, Stephan Irle and Daniel Jacobson of ORNL; Tyler Walker, Jonathan Romero and Angelica Walker of the Bredesen Center for Interdisciplinary Research and Graduate Education at the University of Tennessee, Knoxville; and Jaclyn Noshay and David Kainer, who worked for ORNL and now for Bayer and the University of Queensland respectively.
Funding for the project was provided by the SEED SFA and CBI, both part of the DOE Office of Science’s Biological and Environmental Research Program, by the ORNL Laboratory Directed Research and Development Program, and by the high-performance computing resources of OLCF and Compute. and Data Environment for Science, both also supported by the Office of Science.
UT-Battelle manages ORNL for DOE’s Office of Science, the largest supporter of basic research in the physical sciences in the United States. The Office of Science strives to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.