logo
morpheus
Hero background

A Big-World Benchmark forContinual Reinforcement Learning

Morpheus provides big, evolving enterprise worlds designed to study continual reinforcement learning beyond static benchmarks and artificial task switches.

Continual RL Needs Big WorldsNot Static Benchmarks

Small, Static Worlds

Small, Static Worlds

Most benchmarks fix dynamics, rewards, and structure, masking the challenges of continual adaptation in evolving environments.

Artificial Task Sequences

Artificial Task Sequences

Continual RL benchmarks often rely on explicit task IDs or abrupt switches, rather than gradual, structured world evolution.

No Persistent World

No Persistent World

Episodes reset the world instead of modeling long-lived systems where past decisions shape future dynamics.

Uncontrolled Non-Stationarity

Uncontrolled Non-Stationarity

When environments change, multiple assumptions shift at once, making failure modes impossible to diagnose.

Most RL benchmarks assume a small, static world, but real decision-making happens in large, evolving environments

Internal Research 2026

RL research background
KEY FEATURES

Why Morpheus Exists

Persistent, Evolving Enterprise Worlds

Persistent, Evolving Enterprise Worlds

  • World state persists across time
  • Actions compound and shape future dynamics
  • Continuous adaptation without episode resets
Key features background
Structured Non-Stationary Dynamics

Structured Non-Stationary Dynamics

  • Interpretable regime shifts, not random noise
  • Drift and structural changes in system parameters
  • Cyclic and abrupt transitions driven by latent factors
Key features background
Learning Without Task Labels

Learning Without Task Labels

  • No explicit task boundaries or IDs
  • No resets or predefined curriculum
  • Performance reflects continual adaptation
Key features background
Controlled World Interventions

Controlled World Interventions

  • Explicit, parameterized environment changes
  • Isolated assumption shifts and drift sweeps
  • Reproducible regime transitions for diagnosis
Key features background

Environments we are

launching soon

What we're launching with

EDI Invoice Processing

Exchange of invoice data between businesses in a standard EDI format

EDI

Invoicing

Inbound warehouse Management

Handling supplies coming in / sourcing, receiving, storing raw materials/ goods

WMS

Inbound

What's coming next

Outbound warehouse management

Outbound warehouse management01

ERP

Order to Cash

ERP02

ERP

Procure to Pay

ERP03

Production Planning

Manufacturing

Production Planning04
Future environments card background
Waitlist leading background
Waitlist trailing background
logo

Available in early Feb 2026

Join the Morpheus Research

Waitlist Today

Join 50+ research teams training on enterprise environments

FAQ

What you might be wondering

Morpheus is a simulator and benchmark suite for training and evaluating reinforcement learning agents in structured, evolving enterprise environments. It models persistent worlds with realistic processes such as routing, inventory, and resource allocation that change over time rather than resetting between tasks. This makes it a testbed for studying continual learning and adaptation under real-world non-stationarity.

Morpheus is built for reinforcement learning researchers and engineers studying decision-making in non-stationary, real-world “like” environments. It is especially suited for work on continual learning, robustness, and generalization, where agents must adapt over time rather than solve a fixed task. The platform supports both academic benchmarking and applied research on long-horizon control in complex systems.

Morpheus evaluates agents in persistent, evolving environments rather than fixed tasks or predefined task sequences. The world changes through structured regime shifts without task labels or resets, so performance reflects true continual adaptation to non-stationarity. Benchmark tasks are defined over families of related worlds, enabling systematic study of adaptation, forgetting, and robustness under distribution shift.

Morpheus differs from existing RL environments by modeling persistent, evolving worlds rather than static tasks with fixed dynamics. Instead of relying on task switches or episodic resets, it exposes agents to structured, continuous non-stationarity driven by changing processes and constraints. This makes it possible to benchmark continual learning and adaptation in settings that better reflect real-world decision-making.

Morpheus environments are framework-agnostic and designed to work with common RL stacks such as PyTorch-based and TensorFlow-based workflows.

Morpheus environments are designed to plug into existing RL pipelines with minimal changes.

Yes. Environments can be run locally, with optional hosted execution planned as part of the Environment Hub.

Morpheus emphasizes realistic environment design, explicit constraints, and validated reward structures. The goal is to ensure agents trained in simulation behave more reliably when evaluated against real-world data and systems.

Morpheus includes a verifier and reward validation layer to ensure consistency, correctness, and reproducibility across runs.

Benchmarking is a core part of Morpheus. Initial benchmarks will be published alongside each environment.

Contributing environments is part of the long-term vision. Contribution guidelines will be shared as the Environment Hub matures.

Early access to Morpheus environments, updates on releases, and opportunities to provide feedback as a design partner.

Selected environments and tooling will be publicly available, with distribution planned through platforms such as Hugging Face and GitHub. Details on access and usage will be shared as the product evolves.

Initial environments and the waitlist experience are launching soon, with the full Environment Hub rolling out incrementally.

Join the waitlist to get early access updates and help shape the first Morpheus environments.

Morpheus helps you answer the following - How sensitive is an algorithm to reward misspecification? - Which inductive biases help under causal interventions? - Does policy generalization fail due to dynamics or observation design? - How much structure does a world model actually recover?

- Swap reward functions without changing dynamics - Toggle observability, delays, or noise - Change causal graph edges and re-run experiments

No, it addresses key strengths that make it theory-relevant - Tasks with known latent structure - Ground-truth transition graphs - Parameterized complexity knobs - Suitable for theoretical and empirical work