Loading market data...

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

MarkTechPostJune 18, 2026 at 2:28 AM

OpenAI's LifeSciBench evaluates whether frontier AI can handle real life-science research across 750 expert-authored tasks, seven workflows, and seven biological domains. Built by 173 PhD scientists with 19,020 rubric criteria, it grades reasoning and decisions, not just recall.

The best model, GPT-Rosalind, passes 36. 1%, leaving large headroom on artifacts, exact outputs, and operational calls. The post OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric appeared first on MarkTechPost.

This is a summary. For the full story, read the original article at MarkTechPost.

Original source: MarkTechPost

OpenAI Releases LifeSciBench, a 750-Task Benchmark Grading AI Models on Real Life-Science Research With Expert-Written Rubric

Related Articles

Is it agentic enough? Benchmarking open models on your own tooling

Introducing LifeSciBench

New research shows how AMIE, our medical AI, could help manage health conditions.