Unlocking AI’s Black Box in Genomics: SQUID’s Deep Dive Into DNA Data

Artificial intelligence (AI) is increasingly used in genomics to sift through vast amounts of genome data to identify potential therapeutic targets, despite the opaque nature of AI decision-making. To address this, Cold Spring Harbor Laboratory scientists have developed SQUID (Surrogate Quantitative Interpretability for Deepnets), a tool designed to enhance the interpretability of AI models in genomics.

SQUID, developed by scientists at Cold Spring Harbor Laboratory, improves the interpretability of AI in genomics by using a large library of <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

DNA

DNA, or deoxyribonucleic acid, is a molecule composed of two long strands of nucleotides that coil around each other to form a double helix. It is the hereditary material in humans and almost all other organisms that carries genetic instructions for development, functioning, growth, and reproduction. Nearly every cell in a person’s body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA).

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>DNA variants and the MAVE-NN program to analyze their effects.

This tool helps researchers make more accurate genetic predictions and supports hypothesis development for a better understanding of genomic functions.

SQUID Pries Open AI Black Box

Artificial intelligence continues to squirm its way into many aspects of our lives. But what about biology, the study of life itself? AI can sift through hundreds of thousands of genome data points to identify potential new therapeutic targets. While these genomic insights may appear helpful, scientists aren’t sure how today’s AI models come to their conclusions in the first place. Now, a new system named SQUID arrives on the scene armed to pry open AI’s black box of murky internal logic.

An illustration outlining the SQUID computational pipeline. Credit: Koo and Kinney Labs / Cold Spring Harbor Laboratory

SQUID: Enhancing AI Interpretability

SQUID, short for Surrogate Quantitative Interpretability for Deepnets, is a computational tool created by Cold Spring Harbor Laboratory (CSHL) scientists. It’s designed to help interpret how AI models analyze the genome. Compared with other analysis tools, SQUID is more consistent, reduces background noise, and can lead to more accurate predictions about the effects of genetic mutations.

How does it work so much better? The key, CSHL Assistant Professor Peter Koo says, lies in SQUID’s specialized training.

“The tools that people use to try to understand these models have been largely coming from other fields like computer vision or natural language processing. While they can be useful, they’re not optimal for genomics. What we did with SQUID was leverage decades of quantitative genetics knowledge to help us understand what these deep neural networks are learning,” explains Koo.

Evan E. Seitz, the lead author of this study, is a postdoc in the Kinney and Koo labs. Credit: Cold Spring Harbor Laboratory

SQUID works by first generating a library of over 100,000 variant DNA sequences. It then analyzes the library of mutations and their effects using a program called MAVE-NN (Multiplex Assays of Variant Effects Neural Network). This tool allows scientists to perform thousands of virtual experiments simultaneously. In effect, they can “fish out” the algorithms behind a given AI’s most accurate predictions. Their computational “catch” could set the stage for experiments that are more grounded in reality.

The Practical Impact of SQUID

“In silico [virtual] experiments are no replacement for actual laboratory experiments. Nevertheless, they can be very informative. They can help scientists form hypotheses for how a particular region of the genome works or how a mutation might have a clinically relevant effect,” explains CSHL Associate Professor Justin Kinney, a co-author of the study.

There are tons of AI models in the sea. More enter the waters each day. Koo, Kinney, and colleagues hope that SQUID will help scientists grab hold of those that best meet their specialized needs.

Though mapped, the human genome remains an incredibly challenging terrain. SQUID could help biologists navigate the field more effectively, bringing them closer to their findings’ true medical implications.

Reference: 21 June 2024, <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

Nature Communications

<em>Nature Communications</em> is an open-access, peer-reviewed journal that publishes high-quality research from all areas of the natural sciences, including physics, chemistry, Earth sciences, and biology. The journal is part of the Nature Publishing Group and was launched in 2010. "Nature Communications" aims to facilitate the rapid dissemination of important research findings and to foster multidisciplinary collaboration and communication among scientists.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>Nature Communications.
DOI: 10.1038/s42256-024-00851-5

Funding: Simons Foundation, <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

National Institutes of Health

The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. Founded in 1887, it is a part of the U.S. Department of Health and Human Services. The NIH conducts its own scientific research through its Intramural Research Program (IRP) and provides major biomedical research funding to non-NIH research facilities through its Extramural Research Program. With 27 different institutes and centers under its umbrella, the NIH covers a broad spectrum of health-related research, including specific diseases, population health, clinical research, and fundamental biological processes. Its mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>National Institutes of Health, Alfred P. Sloan Foundation