Introducing our Rhaister Model and Emerald Bay Dataset

Predicting how cells, tissues, and patients will respond to a drug, cytokine, or genetic perturbation is central to biological and clinical reasoning. The practical goal is to estimate the analysis ready readouts that support this reasoning: which cellular responses are context dependent, which perturbations reveal shared or divergent mechanisms, and which observations should become the basis for the next experiment.
Here we introduce Rhaister, a perturbation response predictor that operates directly on screen level summary statistics. By measuring just a few perturbations in a new biological context, Rhaister predicts the unmeasured perturbations by learning how response patterns vary across reference contexts. This formulation applies to both fine grained molecular readouts, such as transcriptional responses from Tahoe 100M or other large perturbation screen, or on phenotypic endpoints. To train and apply Rhaister on phenotypic endpoints we created Emerald Bay, a purpose built Tahoe dataset that unifies multiday cancer drug perturbation, pooled Mosaic tumor contexts, and paired transcriptomic response measurements.
Across these settings, Rhaister matches or exceeds substantially more expensive virtual cell models, often achieving the highest values possible in evaluation metrics, while training in seconds and running predictions in milliseconds. On Emerald Bay, Rhaister predicts context specific drug phenotypes from sensitivity measurements alone and improves further when including transcriptomic information. We also introduce Rhaister O, predicting drug responses in new contexts from baseline expression alone and, to our knowledge, provides the first zero shot model for this task. Rhaister establishes summary statistic perturbation modeling as a fast, interpretable framework for predicting biological response across new contexts.