Rewriting what we thought was possible in biotechMarzyeh Ghassemi

Have you heard? The tech in biotech is nailing it. Machine learning (ML) and artificial intelligence (AI) can now figure out who has a condition (perhaps better than your doctor can), establish a medical checklist to diagnose you, and help target likely treatments. AI models can help design drugs or find a new purpose for existing ones. At home, just ask your AI assistant—Siri, Alexa, Cortana, or many chatbots—to answer medical questions or talk to you about your day. Those assistants might also have access to information from the smart devices in your home—your scale could work with your Fitbit to check your health.

Are you wondering why that reality doesn’t sound like the one you live in? AI has been compared to electricity—the new fuel the world runs on. But as happened with electricity, the deployment of AI in biotech has been uneven. Practical electric power systems were introduced in the 1880s, and most American cities and towns received electricity from utility companies by the 1920s. But 90% of rural America lacked electricity until Congress established the Rural Electrification Administration in 1936. We’re seeing a similar unbalanced situation with AI today.

The biggest challenge ML and AI face now is ethics. Models are very powerfully built to do something specific, and not to read between the lines. In other words, the model will do only, and exactly, the thing you told it to do, often by learning in whatever way is fastest, even if the training data is highly problematic. If it turns out that male doctors don’t recognize heart attacks in female patients, or if dark-skinned patients’ oxygenation levels are misreported, then this is what the AI learns. Models trained this way could underdiagnose women and minorities if deployed.

It’s been exciting to see technology that rewrites and improves what we thought was an established health concept—how to evaluate the need for knee surgery, for instance. With the help of technology, we can focus resources on areas of human health that are complex and chronically understudied, or we can move on from simply naming inequity issues to fixing those issues. If AI models can highlight places where our society is failing people, those people could have better options. It’s also heartening to see a new focus on reproducibility and benchmarks in AI research.

What ML and AI in biotech broadly need to engage with are the holes that are unique to the study of health. Success stories like neural nets that learned to identify dogs in images were built with the help of high-quality image labeling that people were in a good position to provide. Even attempts to generate or translate human language are easily verified and audited by experts who speak a particular language.

Instead, much of biology, health, and medicine is very much in the stage of fundamental discovery. How do neurodegenerative diseases work? What environmental factors really matter? What role does nutrition play in overall human health? We don’t know yet. In health and biotech, machine learning is taking on a different, more challenging, task—one that will require less engineering and more science.

Marzyeh Ghassemi is an assistant professor at MIT and a faculty member at the Vector Institute (and a 35 Innovators honoree in 2018).