Introducing Rhaister + Emerald Bay

Authored by
Nima Alidoust
Released on
June 9, 2026
Authored by
Nima Alidoust
Released on
June 9, 2026
Summary

Today we release Rhaister, an elegant statistical model that predicts drug phenotypes in new contexts w/ accuracies comparable to experimental assays. We are also releasing Emerald Bay, a 2M cell dataset measuring long time-course phenotypes across 1000s of drug-cell line interactions.

Unlike “virtual cell models," Rhaister goes back to the basics: it builds a minimalist perturbation model from the ground up, directly on summary statistics of the data.

It has been a wonderful surprise for us to see such an interpretable, inexpensive model (trained in seconds, predictions in milliseconds), accomplish what virtual cell models (typically with far more complex architectures) promised to eventually do.

With a handful of example perturbations in a new context, Rhaister predicts responses for other perturbations with accuracies within experimental noise, exceeding state of the art virtual cell models in performance.

It is the first model we have seen that performs significantly beyond mean baselines in a zero-shot setting; a task commonly proposed as a promise of virtual cell models.

Despite its simplicity, it is the first model that shows consistent scaling with more perturbation data.

And it is capable of predicting more complex drug phenotypes such as sensitivity in cancer cells far beyond simple baselines. And there is room to make it much better, so stay tuned.

We show that by testing it against our very unique Emerald Bay dataset, generated using our Mosaic platform, measuring 5-day sensitivity of cancer models to various drugs. And we are open-sourcing that dataset along with Rhaister.

These results demonstrate we can accomplish a lot by going back to basics and building models that, by design, reflect the statistics of the underlying data. Rhaister shows that scaling the right data is far more valuable than scaling parameters.

What excites us more is what comes next: fast and interpretable, Rhaister is far better suited to advance biological reasoning in close iteration with the emerging agentic workflows.

Read the Manuscript: https://tahoebio-assets.com/rhaister-manuscript.pdf

See the Model and datasets: https://huggingface.co/collections/tahoebio/rhaister