DARPA LINC Phase 0

Learning Introspective Control for safety-critical field robots

Figure: Safety Enforcer interface illustrating how operator commands are filtered through learned safety constraints before execution.

Overview

In DARPA Learning Introspective Control (LINC) Phase 0, we explore how learning-based Safety Enforcers can enable shared autonomy for safety-critical field robots, by intervening only when necessary, while preserving operator intent and avoiding disruptive or surprising behaviors.

Specifically, we developed and deployed a learning-based safety filter for a hybrid tracked robot operating across challenging terrains (wedge, single-track incline, narrowing corridor, and chicane), culminating in a fully integrated final field test under unmodeled disturbances.

A central design objective of the program is robustness under adversarial and degraded conditions. During deployment, adversarial disturbances may corrupt or disable visual sensing and induce actuator-level degradation. As a result, all Safety Enforcers are trained without camera observations, and must remain effective under partial actuaion loss or altered actuator dynamics. The resulting safety policy relies exclusively on proprioceptive state (e.g., IMU, joint states, velocities), ensuring safe operation even when vision is unavailable and actuation is imperfect.

Key contributions:

  • Minimal task disruption: maintained safety while preserving operator intent.
  • Not only did not impede progress, but improved challenging terrain traversal.
  • Novice-driver friendly: operator can focus on the task, safety is automatic.

System Overview: Learning Introspective Control

The Safety Enforcer continuously monitors the robot state and operator commands using a learned safety critic $Q^\text{safety}$, intervening only when the proposed action would violate safety constraints. Rather than issuing aggressive overrides, the system computes the closest safe action to minimize disruption during shared autonomy.

Figure: Online safety critic filter (value-based shielding) information flow and block diagram.

Learning-Based Safety Filtering

The Safety Enforcer policy was trained using an adversarial reinforcement learning framework inspired by ISAACS (Hsu* et al., 2023), allowing the system to reason over worst-case disturbances while optimizing safe behavior.

Unlike rule-based safety layers, the learned policy captures nonlinear interactions between robot dynamics, terrain geometry, and control actions.

Figure: The Safety Enforcer trained in simulation with ISAACS. Adversarial agent is modeled as external force applied to the robot (left), and the robot learned to use flippers to dampen the fall on wedge terrain (right).

The wedge terrain represents a high-risk scenario involving large angle variations and potential slam events. With LINC enabled, the robot automatically modulates flipper angles and forward velocity, allowing the operator to command high-level intent (e.g., maximum forward speed) while maintaining safety.

Video: Learned Safety Enforcer enabling surprise-free shared autonomy during wedge traversal.
Figure: Sequence of snapshots showing our LINC Safety Enforcer’s automatic modulation of flippers on the wedge terrain (inline pass) when the operator is applying a maximum forward velocity command: (1) the robot moves towards the tipping point at the top of a wedge (2) LINC slows down the robot at the tipping point and lowers its flippers (3) LINC allows the robot to proceed forward, and flippers smoothly make contact with the terrain (4) LINC slows down the robot and brings flippers up (5) LINC allows robot to continue moving forward (6) LINC brings flippers down in anticipation of a possible upcoming tipping point.
Figure: Evolution of the robot’s pitch angle (top) and pitch angular velocity (bottom) in two representative runs of the inline wedge terrain with LINC off and LINC on. By automatically controlling the flippers, LINC prevents the robot’s nose from dipping far below the horizon (pitch angle remains mostly positive) and keeps the angular velocity from reaching large values. Without LINC, the robot acquires an excessive angular velocity as it traverses ridges, culminating in slam events (red circles) in which this angular velocity drops abruptly to zero as the robot’s tracts impact the downward slope.
Figure: Automatic control of flippers by the LINC Safety Enforcer during a representative run of the wedge terrain. LINC lowers the flippers as the robot approaches and traverses a ridge, preventing slam events and keeping the nose from dipping significantly below the horizon, which results in smoother and faster traversal. As the robot transitions onto the next upward slope, LINC raises the flippers, preventing an unnecessary growth in the pitch angle. Note that the LINC controller has no map of the terrain and uses only the robot’s estimated state.

Additional Scenarios: Single-Track Incline

On steep inclines, the Safety Enforcer prevents toppling by modulating forward velocity and halting motion when safety thresholds are exceeded. Operators can adjust acceptable risk levels to continue traversal when appropriate.

Video: Learned Safety Enforcer enabling surprise-free shared autonomy during single-track incline terrain.
Figure: Automatic modulation of forward speed by the LINC Safety Enforcer on the single-track incline during representative runs with high and low setpoint. In the high-setpoint run (top), LINC brings the robot to a halt before it tips over; eventually, the operator backs out and takes an alternative route. In the low-setpoint run (bottom), LINC gradually restricts the allowable speed as the robot approaches the top of the incline, and then allows it to speed up along the downslope. The green shading indicates times at which the Safety Enforcer is intervening.

Final Field Test: Unified Safety Enforcer

In the final evaluation, a single unified Safety Enforcer was deployed across all terrains without mode switching. The system remained robust even under unmodeled disturbances, including a pendulum payload attached to the robot.

Sequence of snapshots from the final field test demonstrating safe traversal across diverse terrains with a single learned safety policy.

Takeaways

This project demonstrates how learning-based safety mechanisms can be:

  • Effective: preventing failure modes such as toppling, slamming, and collisions
  • Human-centered: preserving operator intent with minimal, predictable intervention
  • Robust: generalizing across terrains and unmodeled disturbances

The techniques developed here inform my broader research on safe reinforcement learning and real-world deployment of autonomous systems.

References

2023

  1. ISAACS: Iterative Soft Adversarial Actor-Critic for Safety
    Kai-Chieh Hsu*, Duy Phuong Nguyen*, and Jaime Fernàndez Fisac
    In Proceedings of The 5th Annual Learning for Dynamics and Control Conference, 15–16 jun 2023