Self-Improving Open Scientific Agents

A lab for agents that learn from their own runs

Draft, criticize, benchmark, remember.

This demo is for scientific AI systems that should not answer once and disappear. It routes a request through specialist agents, attacks the first draft, scores the result against reproducibility and risk gates, stages human-approved lessons, and prepares a stronger next run.

Self-critique with receiptsThe critic records what failed, why it matters, and what should change in the next attempt.

Benchmark gateQuality, risk, reproducibility, and reviewer signals decide whether a lesson is worth keeping.

Open-model storylineShows OpenClaw-style orchestration and Hermes-compatible reasoning lanes with clear scope boundaries.

Shared memoryApproved lessons can improve the Statistics, Numerical Analysis, and Optimal Control/OR agents.

Improvement loop at a glance

Try a scientific request and watch the system draft, critique, revise, benchmark, stage memory, and export an audit trail for the next run.

Critique trailGateway, planner, specialist, critic, benchmark, memory, reproducibility, and report agents expose handoffs.

Learning gateCritic findings, score trajectories, judge rows, and human-gated memory separate improvement from drift.

Python/R/MATLABAdapter sketches describe equivalent backend and payload routes.

Downloadable workMemo, notebook, audit CSV, payload, trace, and artifact bundle preserve each improvement cycle.

Statistics critiqueDiagnose a statistical workflow, score assumptions, and stage reusable lessons.

Solver benchmarkRoute numerical convergence evidence through critic and reproducibility gates.

Decision auditReview a control/OR policy, identify risk, and prepare the next run.

Self-Improving Agents->Statistics->Numerical Analysis->Optimal Control / OR

Shared Scientific Agent Protocol scientific-agent-protocol-v1

The self-improving agent uses the same seven-stage handoff contract to judge and improve the Statistics, Numerical Analysis, and Optimal Control + OR agents.

1 Question

Turn the request into a scoped scientific question and data/model brief.

2 Assumptions

Record data, model, numerical, and decision assumptions before method choice.

3 Method Choice

Select a defensible statistical, numerical, or optimization method with alternatives.

4 Code Artifact

Produce Python, R, and MATLAB/Octave drafts with aligned variables and parameters.

5 Diagnostics

Run checks for missingness, stability, feasibility, uncertainty, or reproducibility.

6 Critique

Let the judge/critic challenge overclaims, weak evidence, and missing tests.

7 Final Memo

Return a cautious memo with evidence, limits, next steps, and downloadable artifacts.

Download the memo, notebook, audit CSV, or a zipped artifact bundle from the latest self-improvement run.

Self-Improving Open Scientific Agents

Control Room

Quick starts