DiscoBench¶
DiscoBench is a modular benchmark for automated algorithm discovery in machine learning.
What is DiscoBench?¶
DiscoBench is a new, open-ended benchmark and research playground for developing automated algorithm discovery and AI scientist systems. DiscoBench has a modular setup, an emphasis on discovering algorithms that transfer, and a huge diversity of tasks! We hope DiscoBench helps drive the frontier of research in algorithm discovery by providing a large-scale, open-ended landscape for evaluating AI research agents!
Key Features¶
- Modular Architecture: Break down ML algorithms into composable components
- Multiple Domains: Support for reinforcement learning, language modeling, computer vision, Bayesian optimization, and more
- Flexible Configuration: Easy switching between baseline and experimental implementations
- LLM-Ready: Designed for automated algorithm discovery using AI agents
- Extensible: Simple framework for adding new tasks and domains
Quick Start¶
Installation¶
Install from source:
git clone git@github.com:AlexGoldie/discobench.git
cd discobench
make install
or install from pip:
pip install discobench
Basic Usage¶
List available domains:
uv run discobench get-domains
Create a full task-domain codebase (with baseline implementations):
uv run discobench create-task --task-domain OnPolicyRL
Create an example task for algorithm discovery:
uv run discobench create-task --task-domain OnPolicyRL --example
See the full Usage Guide for detailed instructions.
Available Domains¶
DiscoBench currently supports the following task domains:
- OnPolicyRL: On-policy reinforcement learning (PPO-style algorithms)
- OffPolicyRL: Off-policy reinforcement learning (DQN-style algorithms)
- LanguageModelling: Pre-training language models
- ComputerVisionClassification: Image classification tasks
- BayesianOptimisation: Black-box optimization
- BrainSpeechDetection: Neural signal analysis
- ModelUnlearning: LLM unlearning tasks
- UnsupervisedEnvironmentDesign: Environment curriculum learning
- ContinualLearning: Learning under non-stationarity
- GreenhouseGasPrediction: Predicting atmospheric greenhouse gas concentrations
See the Domains page for detailed information about each domain.
How It Works¶
1. Modular Components¶
Each task domain is decomposed into modules. For example, OnPolicyRL includes:
- loss.py: Objective function (e.g., PPO loss)
- networks.py: Neural network architectures
- optim.py: Optimization algorithms
- train.py: Training loop logic
2. Base and Edit Implementations¶
Each module has two versions: - Base: Fully implemented, tested baseline - Edit: Template with function signatures for customization
3. Configuration-Driven¶
Control which modules use baseline vs. custom implementations via YAML config:
change_optim: true # Use custom optimizer
change_loss: false # Use baseline loss
change_networks: false
change_train: false
4. Task Generation¶
DiscoBench assembles the configured modules into a complete, runnable task in task_src/:
discobench create-task --task-domain OnPolicyRL
cd task_src/OnPolicyRL
python run_main.py
Documentation¶
For Users¶
- Usage Guide: CLI commands, Python API, and workflows
- Domains: Available task domains and their modules
For Contributors¶
- Contributing Overview: How to add new tasks to DiscoBench
- Dataset Integration: Adding new datasets to tasks
Example Use Cases¶
Algorithm Discovery with LLMs¶
Use DiscoBench to have AI agents discover new ML algorithms: 1. Configure which modules should be generated by the LLM 2. LLM writes implementations for those modules 3. Evaluate performance across multiple tasks 4. Iterate and refine based on results
Transfer Learning Research¶
Test if components discovered on one task generalize to others: 1. Discover algorithm on training tasks 2. Evaluate on held-out test tasks 3. Measure generalization across domains
Project Structure¶
discobench/
├── tasks/ # Task domain implementations
│ ├── OnPolicyRL/
│ ├── LanguageModelling/
│ └── ...
├── utils/ # Core utilities
├── create_task.py # Task generation logic
├── create_config.py # Configuration utilities
└── cli.py # Command-line interface
task_src/ # Generated task files (after running create-task)
Contributing¶
We welcome contributions! DiscoBench grows stronger with more tasks and domains.
- Found a bug? Open an issue
- Want to add a task? See the Contributing Guide
- Adding datasets? Check the Dataset Integration Guide
Citation¶
If you use DiscoBench in your research, please cite:
@article{goldie2025discobench,
title={DiscoBench: An Open-Ended Benchmark For Algorithm Discovery},
author={Alexander D. Goldie and Zilin Wang and Adrian Hayler and Deepak Nathani and Edan Toledo and Ken Thampiratwong and Aleksandra Kalisz and Michael Beukman and Alistair Letcher and Shashank Reddy and Clarisse Wibault and Theo Wolf and Charles O'Neill and Jakob N. Foerster and Shimon Whiteson and Roberta Raileanu},
year={2025}
}
Links¶
- GitHub Repository: https://github.com/AlexGoldie/discobench
- Documentation: https://AlexGoldie.github.io/discobench
- Blog: https://alexgoldie.github.io/discobench-blog/
- PyPI Package: Coming soon
License¶
This project is licensed under the terms specified in the LICENSE file.