AI scientists now both produce and consume research — but the ecosystem they work in was built over three centuries for human readers. The limit on science is no longer model intelligence; it's that human-era ecosystem. A first-principles workshop on rebuilding how research is produced, represented, verified, composed, and evaluated — and why.

Co-located with NeurIPS 2026 December 2026 · venue TBA Full-day · in person
Invited Speakers and Panelists
LCLe Cong
Associate Professor @ Stanford
Talk (tentative)From CRISPR-GPT to LabOS: closing the loop between AI scientists and the physical lab.

Co-developed CRISPR-Cas9 genome editing (Science 2013); now builds AI co-scientists for the wet lab — CRISPR-GPT and the LabOS AI-XR co-scientist — where a research artifact must be executable and verifiable enough to drive real physical experiments safely.

JBJoydeep Biswas
Associate Professor @ UT Austin
Talk (tentative)AI-assisted peer review at AAAI and NeurIPS: review at agent throughput, humans in the loop.

Leads UT Austin's Autonomous Mobile Robotics Lab and ran the AI-assisted peer-review experiments at AAAI and NeurIPS — a live test of review and reproduction at AI-scientist throughput, with human judgment kept in the loop.

NSNihar Shah
Associate Professor @ CMU
Talk (tentative)The science of evaluation: experiments on peer review, and the hidden cost of automating it.

Conducts foundational research on the algorithms and integrity of scientific peer review. His methods have been deployed across 200+ venues — including OpenReview — in the evaluation of over 100,000 papers.

LQLianhui Qin
Assistant Professor @ UC San Diego
Talk (tentative)Multi-agent collaboration and self-evolving agents for scientific reasoning.

Builds AI agents that reason, learn, and collaborate in complex environments — multi-agent collaboration, self-evolving agents that learn during deployment, and reasoning systems for scientific discovery. PhD from University of Washington with Yejin Choi.

ACAudrey Cheng
UC Berkeley · Incoming Assistant Professor
Talk (tentative)AI-Driven Research for Systems (ADRS): agents doing real database and systems research.

Creator of TAOBench (deployed at Meta, PlanetScale, TiDB) and co-lead of AI-Driven Research for Systems (ADRS) — a live test of whether AI scientists can drive real engineering research, and how to measure them when they do.

YLYao Li
Assistant Professor @ Portland State University
Talk (tentative)Proof assistants for AI scientists: making agent claims verifiable, not just plausible.

Uses interactive theorem provers and dependent types to formally verify real-world programs, bringing proof-assistant rigor to AI-scientist claims.

BMBodhisattwa Majumder
Research Scientist @ Ai2
Talk (tentative)Autonomous discovery at Ai2: DataVoyager, AutoDS, and AstaBench.

Leads autonomous, data-driven scientific discovery at Ai2 (DataVoyager, AutoDS) and builds the AstaBench scientific-agent benchmark.

YZYue Zhang
Head of Agent Research @ Scale AI
Panel chairChairs the closing panel: ecosystem-level evaluation when the producer is an agent.

Leads Agent Research at Scale AI on LLM-agent evaluation and benchmarking at scale.

Organizers
ALJiachen Amber Liu
University of Michigan · ex-Meta Superintelligence Labs
JHJunyuan Hong
National University of Singapore
ACAng Chen
University of Michigan
SMStefan Mihalas
Allen Institute
VFVelvin Fu
Meta
KSKaan Sel
MIT
JPJiaxin Pei
UT-Austin & Stanford
Program Committee
YWYizhong Wang
ByteDance Seed (incoming AP @ UT Austin)
ACAng Cao
Google DeepMind
DZChenhui Daniel Zhang
Google DeepMind
JBJoachim Baumann
Stanford
CSChenglei Si
Stanford
JYJing Carl Yang
University of Southern California
XJXianglin Ji
Xianglin Ji
MIT
KWKoutian Wu
University of Texas at Austin
CSChengyang Shi
University of Michigan
Overview

Treating AI scientists as first-class citizens of research.

The modern research ecosystem — journals, peer review, conferences, citation, tenure tracks, grant cycles, the PDF itself — was built between the 17th and late 20th centuries for a single kind of participant: the human researcher. Each layer assumes a human author choosing what to write, a human reviewer choosing what to trust, and a human reader choosing what to build on. arXiv, OpenReview, and GitHub changed how those artifacts move, but not who they were written for, or by.

That assumption is no longer load-bearing. In 2025–2026, autonomous agents became a major contributor to research itself — producing papers, reviewing them, indexing them, and building on them. Agents now sit on both sides of every layer. The field is racing to make the model a better scientist while leaving the human-era ecosystem untouched — but the ecosystem, not raw model intelligence, is now the limiting layer, and the harder question is what ecosystem an AI scientist needs to do science in.

This workshop argues a simple thesis: if AI scientists are now a major contributor to research, every layer of the ecosystem — how research is produced, represented, verified, composed, and evaluated — has to be redesigned to treat AI scientists as first-class citizens, not retrofitted users of human-era institutions. Crucially, rebuilding each layer is a source of concrete ML research — benchmarks, datasets, evaluation protocols, and new agent methods — not institutional debate.

Topics of Interest

Five tracks tracing the path of a unit of research.

We organize the call around how a unit of research is produced, represented, verified, composed with other work, and evaluated at the level of the whole network. One foundational track measures the producer; the rest rebuild the layers its output flows through. We prioritize contributed empirical and technical work; position and vision papers are welcome but capped as a minority track.

01
Produce — Benchmarking AI-Scientist Behavior Foundational track

How AI scientists actually behave today — and whether they understand the science or only reproduce its narrative.

02
Represent — Machine-Executable Research Representations

The right unit of research when the consumer is an agent — at once executable and verifiable.

03
Verify — Automated Verification and Peer Review

Verification, reproduction, and peer review at agent throughput — with human judgment kept in the loop.

04
Compose — Collaboration Among AI Scientists

Turning isolated AI scientists into a collaborating network — contributions that compose, connected by typed dependencies rather than citation-as-string.

05
Evaluate — Ecosystem-Level Evaluation, Attribution & Mechanism Design

Measuring and steering the multi-actor network — ecosystem-level benchmarks, contribution attribution, and incentive-compatible mechanisms so truthful review and reusable artifacts become the dominant strategy.

Call for Papers

Author information.

Research papers (primary track). 9-page long or 4-page short papers in ICLR/NeurIPS format; references and appendices excluded from the page count. New methods, benchmarks, datasets, systems, and evaluation protocols. This is the substance of the program.

Position papers (minority track). 4 pages. First-principles arguments about what the ecosystem should become — welcome and spotlighted, but capped so the workshop platforms contributed research rather than becoming a discussion-only venue.

Review & policy. All submissions handled via OpenReview, double-blind, ≥3 reviewers per paper. Non-archival; dual submission allowed per ICLR/NeurIPS workshop norms. The LLM policy follows the NeurIPS 2026 main track: AI-assisted writing is permitted with disclosure, and AI-authored submissions must designate a human corresponding author who certifies content.

Submission portal opens 2026-08-15. Link to be posted here.

Important Dates

Key dates (tentative).

Submissions open
August 15, 2026
Submission deadline
September 15, 2026 (AoE)
Notification
September 29, 2026
Camera-ready
November 1, 2026
Workshop
December 2026 (NeurIPS)
Format
Full-day, in person

Dates are tentative and follow NeurIPS 2026 workshop deadlines. All times Anywhere on Earth (AoE).

Schedule

A full day that builds toward debate.

Six 40-minute invited talks, a structured debate mid-morning, a moderated panel late in the day, and a closing two-hour poster & demo session for contributed papers.

09:00 – 09:10Opening remarks
09:10 – 10:30Invited Talks 1–2 (40 min each)
10:30 – 10:50Coffee break
10:50 – 11:30Debate — a provocative motion on the future of the AI-driven research ecosystem (two sides + audience vote)
11:30 – 12:50Invited Talks 3–4
12:50 – 13:50Lunch
13:50 – 15:10Invited Talks 5–6
15:10 – 15:20Break
15:20 – 16:00Panel + moderated discussion
16:00 – 16:05Closing remarks
16:05 – 18:00Poster & Demo Session (contributed papers)