Launching the Tahoe Blog


The Tahoe Blog is Live!
In the same open-source spirit that led us to release Tahoe-100M—our gigascale, perturbative single-cell dataset—we’re now sharing many of the discoveries it has enabled (see link to the blog and the inaugural post in the comments).
Tahoe-100M isn’t just a large dataset powering next-generation AI models in biology. It’s a new modality of data for discovery.
Instead of designing a new experiment for every question, you can now ask hundreds—across thousands of perturbations and cancer models—using high-content, single-cell data already in hand. This changes the game. You can answer questions you weren’t even asking. Discoveries emerge before the hypothesis.
It’s part of a larger shift: from hypothesis-driven to hypothesis-generating science. From narrow to panoramic views of biology.
Think of it like the transition from RNNs to Transformers in AI. On the surface, they might look similar—but their ability to process all contexts at once unlocked entirely new capabilities. Tahoe-100M does something similar for biology.
Measuring how a handful of cancer models respond to a drug is common. But scaling that to thousands of cancer models and drugs—with the resolution of single-cell data—reveals mechanisms and relationships you’d otherwise miss. That’s what makes this data more than the sum of its parts.
In this blog, we’ll highlight discoveries that surprised even us—insights hiding in plain sight, enabled only by this new scale and structure of data. We’re sharing them to spark new questions, inspire new therapeutic directions, and ultimately help patients.
We begin with one such finding: We uncovered previously unknown mechanisms associated with Saquinavir—mechanisms that could have predicted the cardiovascular toxicity that was only detected during clinical use. The drug, developed by Roche, was later discontinued in the US.