The AI Scientist: AI That Ran Its Own Research & Got in Nature

For less than fifteen dollars, an artificial intelligence system can now generate a complete scientific paper, run its own experiments, plot the results, write up the findings, and evaluate the work through automated peer review. That price tag appears in a landmark paper published in Nature in March 2026, and it marks one of the most significant milestones yet in the push toward fully autonomous scientific research [1].

The system is called the AI Scientist, developed by Sakana AI in collaboration with researchers from the University of British Columbia, the Vector Institute, and the University of Oxford. It represents the first time an AI has navigated the entire research pipeline independently, from the initial spark of an idea through to a published manuscript [3]. "This is the first time an AI has been able to go through the entire process of doing scientific research on its own, from generating the idea all the way through conducting the experiments, analyzing the results, and writing the paper," said Jeff Clune, one of the researchers involved [3].

How the System Works

The AI Scientist operates as a fully automated pipeline. Given a starting code template or research direction, the system generates novel ideas, writes the necessary code to test those ideas, executes the experiments, analyses the resulting data, produces the figures and plots, and drafts the complete scientific manuscript. The entire process requires no human intervention at any single step [2].

At the core of the system lies an agentic tree search mechanism with four distinct stages. First, the AI conducts an initial investigation to explore the research space and identify promising directions. Next, it performs hyperparameter tuning to optimize the technical settings for its experiments. The third stage sees the system execute the core research agenda, running the actual experiments and collecting data. Finally, ablation studies are conducted to understand which components of the approach contribute to its success or failure [1].

A Vision-Language Model feedback loop plays a critical role in refining the outputs. This loop allows the system to evaluate and improve its figures not just for scientific accuracy but also for visual presentation quality [2][4]. The result is a paper that looks and reads like something produced by a human researcher, complete with charts, analysis, and a coherent narrative structure.

The Road to Nature

The journey to a Nature publication took roughly one and a half years of development [2]. The project evolved through distinct phases, each shaped by advances in foundation models.

In August 2024, AI Scientist-v1 demonstrated that given a starting code template, the system could autonomously generate novel ideas, create and run experiments, and write a full paper. By the time of the Nature publication, the v2 version had produced the first fully AI-generated paper to pass rigorous human peer review [2].

Three AI-generated papers were initially submitted to the ICLR 2025 ICBINB workshop. One achieved peer review scores of 6, 7, and 6, averaging 6.33 out of 10. That score surpassed the average threshold for human paper acceptance at that venue [1]. As a matter of transparency, the researchers voluntarily withdrew their accepted AI submissions from the workshop after acceptance [2].

The research shows a clear pattern: paper quality consistently improves with the release date of the underlying model being used. The correlation is statistically significant, with a P-value below 0.00001 [1]. Put simply, as AI models have grown more capable, the papers they produce have become substantially more sophisticated.

Automating Peer Review

Perhaps the most striking aspect of the AI Scientist system is its ability to review its own work. The Automated Reviewer component evaluates papers using the same criteria applied by human reviewers at major conferences. In testing, it achieved a balanced accuracy of 69%, comparable to human reviewers [1]. More remarkably, the Automated Reviewer's F1 score actually exceeded the inter-human agreement measured in the NeurIPS 2021 consistency experiment [2]. The system can assess whether a paper meets the standards for publication, providing a form of quality control built directly into the research pipeline.

This capability raises obvious questions about circularity and self-assessment, but the researchers argue it serves as a practical benchmark. If the system can produce work that meets the same standards human reviewers apply to human researchers, that represents a meaningful threshold of scientific quality.

Real Limitations Remain

Despite the impressive capabilities, the current system has meaningful constraints. The AI Scientist tends to produce ideas that are naive or underdeveloped, lacking the conceptual depth that comes from years of human training and intuition [1]. It struggles with deep methodological rigor, often missing the nuance that experienced researchers bring to experimental design.

The system also hallucinates. It generates confident claims that may not be supported by the data it produces, and it produces inaccurate citations [1]. These are not minor flaws, they represent the difference between a system that can simulate scientific output and one that truly understands what it is doing.

What This Means for Science

Jakob Foerster from the University of Oxford has pointed out that the AI Scientist could in principle continue improving itself recursively, raising profound questions about the future of scientific discovery [4]. If the system can generate better research ideas, write better code, and produce better papers, what happens when that capability turns back on itself?

The researchers recommend that the scientific community establish clear norms for treating AI-generated research [4]. Questions about attribution, accountability, and the role of human oversight need answers before fully autonomous AI research becomes routine.

The fifteen-dollar price tag is genuinely startling. It means that generating a complete, peer-reviewed-quality scientific paper now costs less than a fast-food meal for a family of four. That democratizes something that was previously the province of well-funded labs with large research teams.

Whether AI will replace human researchers entirely remains an open question. The AI Scientist has demonstrated that the mechanical parts of research, the code-writing and experiment-running and paper-writing, can be automated. What it has not replicated is the creativity, intuition, and conceptual depth that drive truly novel scientific discoveries. For now, the machine can do the work. The questions still have to come from us.

The AI Scientist: The Computer Program That Ran Its Own Research and Got Published in Nature