ISAACS

ISAACS (Iterative Soft Adversarial Actor-Critic for Safety) (Hsu* et al., 2023) is a new game-theoretic reinforcement learning scheme for approximate safety analysis, whose simulation-trained control policies can be efficiently converted at runtime into robust safety-certified control strategies, allowing robots to plan and operate with safety guarantees in the physical world.

Video: Synthesis and runtime of safety critic filter with ISAACS on quadruped robot.

Gameplay Filters

Figure: Snapshots of two different quadruped robots under adversarial conditions, including external tugging forces and unmodeled terrains. The gameplay filter preserves safety across experiments, compared to the unfiltered counterpart.

The gameplay filter (Nguyen* et al., 2024) is a new class of predictive safety filters, offering a general approach for runtime robot safety based on game-theoretic reinforcement learning and the core principles of safety filters. Our method learns a best-effort safety policy and a worst-case sim-to-real gap in simulation, and then uses their interplay to inform the robot’s real-time decisions on how and when to preempt potential safety violations.

Learn from adversity

Our approach first pre-trains a safety-centric control policy in simulation, by pitting it against an adversarial environment agent that is simultaneously learning to steer the robot towards catastrophic failures. This escalation produces a robust robot safety policy that is remarkably hard to exploit, but also an estimate of the worst-case sim-to-real gap that the robot might encounter after deployment. The algorithm updates a safety value network (critic) and keeps a leaderboard of the most effective player policies (actors).

Synthesis: We employ a game-theoretic reach-avoid reinforcement learning scheme that iteratively pits the robot's controller against a simulated adversarial environment. The algorithm updates a safety value network (critic) and keeps a leaderboard of the most effective player policies (actors).

Never lose a game

At runtime, the learned player strategies become part of a safety filter, which allows the robot to pursue its task-specific goals or learn a new policy as long as safety is not in jeopardy, but intervenes as needed to prevent future safety violations.

Runtime: Our gameplay filter maintains safety by continually playing out imagined safety games between the best learned controller and disturbance. It only blocks task-driven actions that could lead to losing future games (i.e., violating safety) and replaces them with the learned safety controls.

To decide when and how to intervene, the gameplay filter continually imagines (simulates) hypothetical games between the two learned agents after each candidate task action: if taking the proposed action leads to the robot losing the safety game against the learned adversarial environment, the action is rejected and replaced by the learned safety policy.

Results

1. Tugging Forces and Irregular Terrain Evaluation

We evaluate the Gameplay Filters on two quadruped robot platforms, Unitree Go2 and Ghost Robotics S40, across two experimental settings:

Matched ODD (50 N tugging force): A disturbance consistent with the training Operational Design Domain (ODD), designed to assess whether the Gameplay Filter can maintain robust safety without excessively hindering task execution.
Unmodeled terrain: A deployment scenario outside the training distribution, used to evaluate whether the Gameplay Filter can preserve zero-shot safety under unmodeled conditions.

Gameplay filter on unmodeled terrain

Video: Gameplay Filter on bumpy terrain experiment.

Gameplay filter under tugging force

Video: Gameplay Filter against tugging force experiment.

Baseline comparison: Safety critic filter and unfiltered task policy

Video: We compare the result of gameplay filter against 2 baselines: Safety critic filter, and unfiltered task policy. Across all experiments, the gameplay filter maintains higher safe rate, and only fail when the adversarial bound exceeds the defined ODD.

2. Implicit robustness against degradation

In this demonstration, the rear right abduction motor was broken. The robot’s task policy and safety filter were unaware of this.

Video: A demonstration showing gameplay filter maintaining safety for the robot with broken motor (rear right abduction) while under tugging force. This highlights the effectiveness of the method, with implicit robustness against degradation.

Similarly, when the motors of the Unitree Go2 were noticeably degraded, with incorrect encoder readings and dampened actuation performance during the CoRL 2024 demo, the manufacturer’s built-in controller could no longer stabilize the robot, causing it to fail on its own.

Despite this, our Gameplay Filters solution continued to function, demonstrating strong robustness to real-world degradation and adversarial conditions.

Video: (Left) Manufacturer's built-in controller causes robot from falling by itself after motor and sensor degradation. (Right) Gameplay filter allows the robot to still function safely.

3. Tackling large sim-to-real gap

In this demonstration, we train and deploy an RL-driven locomotion task policy on the Unitree Go2 robot with large sim-to-real gap. When deployed on the robot, the task policy causes the robot to flip over. Deploying gameplay filter allows the robot to complete the sequence of task actions without falling.

Video: Sim-to-real deployment of an RL-trained locomotion policy on the Unitree Go2. Due to a large sim-to-real gap, the unfiltered policy causes the robot to flip during execution (left). When augmented with the Gameplay Filter, the robot successfully completes the task sequence without falling (right).

Takeaway

Gameplay filters allow robots to maintain robust zero-shot safety across deployment conditions with minimal impact on task performance.

It only overrides unsafe actions that would cause a safety failure for some realization of uncertainty.
Only requires a single trajectory rollout at each control cycle, enabling runtime safety filtering.
To our knowledge, this is the first successful demonstration of a full-order safety filter for legged robots (36-D).

Key contributions:

Scalable: The filter’s neural network makes it suitable for challenging robotic settings like walking on abrupt terrain and under strong forces.
General: A gameplay filter can be synthesized automatically for any robotic system. All you need is a (black-box) dynamics model.
Robust: The gameplay filter actively learns and explicitly predicts dangerous discrepancies between the modeled and real dynamics.

Despite the impressive recent advances in learning-based robot control, ensuring robustness to out-of-distribution conditions remains an open challenge. Safety filters can, in principle, keep arbitrary control policies from incurring catastrophic failures by overriding unsafe actions, but existing solutions for complex (e.g., legged) robot dynamics do not span the full motion envelope and instead rely on local, reduced-order models. These filters tend to overly restrict agility and can still fail when perturbed away from nominal conditions. This paper presents the gameplay filter, a new class of predictive safety filter that continually plays out hypothetical matches between its simulation-trained safety strategy and a virtual adversary co-trained to invoke worst-case events and sim-to-real error, and precludes actions that would cause failures down the line. We demonstrate the scalability and robustness of the approach with a first-of-its-kind full-order safety filter for (36-D) quadrupedal dynamics. Physical experiments on two different quadruped platforms demonstrate the superior zero-shot effectiveness of the gameplay filter under large perturbations such as tugging and unmodeled terrain. Experiment videos and open-source software are available online: https://saferobotics.org/research/gameplay-filter

The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer’s uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.

ISAACS and Gameplay Filters