From MuJoCo to Real Hardware: Running RL Policies Through ROS 2

The Challenge

Training a reinforcement learning policy in simulation is a major milestone. But in robotics, the real challenge starts when that policy needs to run on physical hardware.

An early-stage robotics customer came to Ekumen with a MuJoCo-trained RL policy for a robot manipulator. They needed a ROS 2-based system that could connect that policy to the actual robot, including the arm, gripper, motor controllers, perception components, and operator tools required for demos.

The goal was not only to execute the policy. The customer needed a reproducible development environment, reliable hardware integration, and a testing workflow that could support a fast-growing engineering team.

What the Customer Needed

The project had three main requirements:

Create a reproducible development environment to run RL-trained policies against real hardware through ROS 2.
Build shared-control interfaces so operators could assist with setup and control during demos.
Research stereo vision-based pose estimation approaches and integrate the selected options into the software stack.

The system also needed to support both the real robot and MuJoCo-based simulations for end-to-end testing in CI environments.

The Solution

Ekumen built a Dockerized development environment based on ROS 2 Humble. This environment was designed to run both the real robot software stack and MuJoCo simulations of the same system.

This gave the customer a consistent setup for development, testing, onboarding, and CI. For a growing team, that reproducibility was critical: new engineers could start working with the stack without having to manually reconstruct dependencies or local configurations.

Hardware components and controllers were configured using ROS 2 Control, creating a structured interface between the RL policy execution layer and the robot manipulator hardwaredware.

From there, the team developed the control software needed to execute the policies on the real system.

Chart

Hardware Integration and Custom Drivers

A key part of the project involved adapting the hardware layer.

That meant customizing and patching existing vendor drivers for the robot arm and gripper motors, improving behavior and real-time performance. The team also developed driver software for a custom-made gripper motor built by another contractor.

This kind of driver work is often where simulation-trained robotics systems encounter friction. The policy may work in MuJoCo, but the real robot depends on hardware interfaces, timing behavior, communication constraints, and controller performance.

Making the stack usable on the required hardware meant solving those integration details, not just connecting a model output to a command topic.

Shared Control for Demos

The customer also needed reliable demo support.

Part of that work was developing interfaces that allowed operators to set up and control the system during demonstrations. This made the demo pipeline more robust and reduced the dependence on fragile manual steps or developer-only tools.

For early-stage robotics teams, this matters. Demos often happen before the system is fully autonomous, but they still need to be repeatable, understandable, and safe to operate.

Pose Estimation and CI Testing

On the perception side, multiple stereo vision-based pose estimation approaches were evaluated jointly with the client and the most suitable ones integrated into the ROS 2 stack.

The team also created MuJoCo-based simulations to run end-to-end tests of the ROS 2 software stack in CI environments.

This gave the customer a practical validation layer before testing on physical hardware. Simulation did not replace hardware testing, but it helped catch integration issues earlier and made the development process more reliable.

What We Delivered

Ekumen delivered:

A Dockerized ROS 2 Humble development environment.
A stack capable of running both real robot software and MuJoCo simulations.
ROS 2 Control configuration for the robot manipulator.
Control software to execute MuJoCo-trained RL policies on hardware.
Patched vendor drivers for the robot and gripper motors.
A custom driver for a new gripper motor.
Integrated stereo vision-based pose estimation approaches.
Shared-control interfaces for demos.
CI-ready MuJoCo simulations for end-to-end testing.

The Result

The customer received a simple-to-use development environment at a critical stage in the team’s growth. New engineers could onboard faster, the real robot and simulation workflows were easier to run, and the demo pipeline became more robust.

Most importantly, the customer moved from having RL policies trained in simulation to having a ROS 2-based execution path for real manipulator hardware.

Why It Matters

In robotics, an RL policy is only as useful as the system around it.

To move from simulation to hardware, teams need reliable drivers, control interfaces, reproducible environments, simulation-based testing, perception integration, and operator tools.

That is the engineering work that turns a trained policy into something that can run on a robot.

Working on RL policy execution, ROS 2 Control, or simulation-to-hardware integration? Talk to our robotics software team.