DARPA LINC Phase 0 | Duy P. Nguyen

Figure: Safety Enforcer interface illustrating how operator commands are filtered through learned safety constraints before execution.

Overview

In DARPA Learning Introspective Control (LINC) Phase 0, we explore how learning-based Safety Enforcers can enable shared autonomy for safety-critical field robots, by intervening only when necessary, while preserving operator intent and avoiding disruptive or surprising behaviors.

Specifically, we developed and deployed a learning-based safety filter for a hybrid tracked robot operating across challenging terrains (wedge, single-track incline, narrowing corridor, and chicane), culminating in a fully integrated final field test under unmodeled disturbances.

A central design objective of the program is robustness under adversarial and degraded conditions. During deployment, adversarial disturbances may corrupt or disable visual sensing and induce actuator-level degradation. As a result, all Safety Enforcers are trained without camera observations, and must remain effective under partial actuaion loss or altered actuator dynamics. The resulting safety policy relies exclusively on proprioceptive state (e.g., IMU, joint states, velocities), ensuring safe operation even when vision is unavailable and actuation is imperfect.

Key contributions:

Minimal task disruption: maintained safety while preserving operator intent.
Not only did not impede progress, but improved challenging terrain traversal.
Novice-driver friendly: operator can focus on the task, safety is automatic.

System Overview: Learning Introspective Control

The Safety Enforcer continuously monitors the robot state and operator commands using a learned safety critic $Q^\text{safety}$, intervening only when the proposed action would violate safety constraints. Rather than issuing aggressive overrides, the system computes the closest safe action to minimize disruption during shared autonomy.

ISAACS information structure in training and deployment

Figure: Online safety critic filter (value-based shielding) information flow and block diagram.

Learning-Based Safety Filtering

The Safety Enforcer policy was trained using an adversarial reinforcement learning framework inspired by ISAACS (Hsu* et al., 2023), allowing the system to reason over worst-case disturbances while optimizing safe behavior.

Unlike rule-based safety layers, the learned policy captures nonlinear interactions between robot dynamics, terrain geometry, and control actions.

GVR training in simulation with force adversaries

GVR learned to use flippers in simulation

Figure: The Safety Enforcer trained in simulation with ISAACS. Adversarial agent is modeled as external force applied to the robot (left), and the robot learned to use flippers to dampen the fall on wedge terrain (right).

Featured Results: Wedge Terrain Traversal

The wedge terrain represents a high-risk scenario involving large angle variations and potential slam events. With LINC enabled, the robot automatically modulates flipper angles and forward velocity, allowing the operator to command high-level intent (e.g., maximum forward speed) while maintaining safety.

Video: Learned Safety Enforcer enabling surprise-free shared autonomy during wedge traversal.

DARPA LINC GVR robot on wedge terrain - snapshot

Figure: Sequence of snapshots showing our LINC Safety Enforcer’s automatic modulation of flippers on the wedge terrain (inline pass) when the operator is applying a maximum forward velocity command: (1) the robot moves towards the tipping point at the top of a wedge (2) LINC slows down the robot at the tipping point and lowers its flippers (3) LINC allows the robot to proceed forward, and flippers smoothly make contact with the terrain (4) LINC slows down the robot and brings flippers up (5) LINC allows robot to continue moving forward (6) LINC brings flippers down in anticipation of a possible upcoming tipping point.

Automatic flipper modulation during wedge traversal

Figure: Evolution of the robot’s pitch angle (top) and pitch angular velocity (bottom) in two representative runs of the inline wedge terrain with LINC off and LINC on. By automatically controlling the flippers, LINC prevents the robot’s nose from dipping far below the horizon (pitch angle remains mostly positive) and keeps the angular velocity from reaching large values. Without LINC, the robot acquires an excessive angular velocity as it traverses ridges, culminating in slam events (red circles) in which this angular velocity drops abruptly to zero as the robot’s tracts impact the downward slope.

Wedge terrain - Flippers action of representative run

Figure: Automatic control of flippers by the LINC Safety Enforcer during a representative run of the wedge terrain. LINC lowers the flippers as the robot approaches and traverses a ridge, preventing slam events and keeping the nose from dipping significantly below the horizon, which results in smoother and faster traversal. As the robot transitions onto the next upward slope, LINC raises the flippers, preventing an unnecessary growth in the pitch angle. Note that the LINC controller has no map of the terrain and uses only the robot’s estimated state.

Additional Scenarios: Single-Track Incline

On steep inclines, the Safety Enforcer prevents toppling by modulating forward velocity and halting motion when safety thresholds are exceeded. Operators can adjust acceptable risk levels to continue traversal when appropriate.

Video: Learned Safety Enforcer enabling surprise-free shared autonomy during single-track incline terrain.

Single-track incline - stopping with high setpoint

Single-track incline - nudging with low setpoint

Figure: Automatic modulation of forward speed by the LINC Safety Enforcer on the single-track incline during representative runs with high and low setpoint. In the high-setpoint run (top), LINC brings the robot to a halt before it tips over; eventually, the operator backs out and takes an alternative route. In the low-setpoint run (bottom), LINC gradually restricts the allowable speed as the robot approaches the top of the incline, and then allows it to speed up along the downslope. The green shading indicates times at which the Safety Enforcer is intervening.

Final Field Test: Unified Safety Enforcer

In the final evaluation, a single unified Safety Enforcer was deployed across all terrains without mode switching. The system remained robust even under unmodeled disturbances, including a pendulum payload attached to the robot.

Sequence of snapshots from the final field test demonstrating safe traversal across diverse terrains with a single learned safety policy.

Takeaways

This project demonstrates how learning-based safety mechanisms can be:

Effective: preventing failure modes such as toppling, slamming, and collisions
Human-centered: preserving operator intent with minimal, predictable intervention
Robust: generalizing across terrains and unmodeled disturbances

The techniques developed here inform my broader research on safe reinforcement learning and real-world deployment of autonomous systems.

The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer’s uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.