By Scaled Foundations / July 19, 2024

Enhancing Robotic Planning with Logically Constrained Transformers

We introduce Perception Action Signal Temporal Logic Transformer (PASTEL), a deep machine learning architecture to control robots safely.

Ensuring that robots and autonomous systems adhere to safety and reliability standards is paramount. The nature and complexity of deep machine learning makes it difficult to guarantee safe behaviors. For example, it is non-trivial to encode desired behaviors in robots and often roboticist and ML engineers use cost and reward functions to guide robot learning.

Motivating Example
Consider a dynamical system (robot) with states \(x_t\) and actions \(a_t\) at discrete time steps \(t\). The objective is to predict a sequence of state-action pairs \({(x_t, a_t)}\) such that the predicted trajectory satisfies a given STL specification \(φ\). For instance \(φ\), imposes the condition that the robot stays clear of the areas at all times.

In this work we address how trajectory generation via transformers can adhere to the STL specifications.

An example includes  a conventional aircraft performing a landing pattern at a busy airport. The landing process involves several critical phases, including descent, approach, and touchdown, each with specific safety and timing requirements. These requirements can be represented using Signal Temporal Logic (STL) to ensure the aircraft adheres to stringent safety protocols.

For instance, the STL specification might include conditions such as:

  • \(φ_1\): The aircraft must descend at a controlled rate to avoid turbulence and ensure passenger comfort.
  • \(φ_2\): The aircraft must maintain a stable approach path and avoid deviations caused by wind shear or other disturbances.
  • \(φ_3\): The aircraft must touch down within a specified time window to maintain the airport’s schedule and efficiency.

By incorporating these STL specifications into the aircraft’s control system, we can enforce that the aircraft performs the landing pattern safely and efficiently. The trajectory generation mechanism, utilizing transformers, can be designed to predict state-action pairs \({(x_t, a_t)}\) that satisfy these STL conditions.

Similarly, consider a navigation robot tasked with traversing multiple rooms in a dynamic environment. The robot must navigate through rooms while avoiding obstacles, ensuring it visits specified locations, and adhering to time constraints. The STL specifications for this task might include:

  • \(φ_4\): The robot must reach room B within 5 minutes of starting from room A.
  • \(φ_5\): The robot must avoid any obstacles that appear unexpectedly in its path.
  • \(φ_6\): The robot must wait in room C until it receives a signal to proceed to room D.

In both examples, STL provides a formal, mathematically rigorous way to specify and verify the desired behaviors, ensuring that the autonomous systems operate within the defined safety and reliability constraints.

PASTEL: Perception Action Signal Temporal Logic Transformer

PASTEL is a novel framework that integrates STL specifications with autoregressive transformer models to ensure safe and reliable robotic planning. The key innovation lies in incorporating STL specifications into the trajectory planning process, thereby enhancing the alignment of robotic behaviors with predefined safety constraints.

 

Figure 1.  The PASTEL Architecture utilizes both causal and cross attention mechanisms to autoregressively predict Signal Temporal Logic (STL) specification satisfying trajectories conditioned on state, action, and specification embeddings.

Concretely, PASTEL uses cross attention operation to mitigate overfitting by ensuring the model attends to both state-action data and logical specifications during trajectory prediction. By treating specification embeddings as queries and state-action embeddings as keys and values, cross attention forces the model to balance the influence of both sources of information. This prevents the model from overfitting solely to the state-action trajectories, which could otherwise ignore the specified constraints. Additionally, incorporating a specification relevance loss further aligns predictions with logical specifications, enhancing the model’s ability to generate safe and specification-compliant trajectories. This integration of cross attention thus allows PASTEL to leverage large pretrained models while maintaining adherence to precise, safety-critical requirements.

Key Components of PASTEL

  • STL Tokenization: Leveraging state-of-the-art text tokenizers like CLIP and BERT, the STL specifications are converted into richly grounded features that guide the trajectory predictions. The tokenization process captures the logical structure of STL, enabling precise behavior encoding.
  • Cross-Attention Mechanism: Inspired by Vision Language Action (VLA) models, PASTEL employs a cross-attention mechanism where specification embeddings act as queries, and state-action embeddings serve as keys and values. This ensures that the model “attends” to the specifications while making predictions, thereby enhancing the adherence to safety constraints.
  • Specification Conditioned Prediction: By appending the specification token to each state-action token at every timestep, the model emphasizes the importance of the specification throughout the trajectory prediction process.

Training and Evaluation

The model is trained on a generated dataset that includes state-action trajectories across various specifications, capturing common motion planning patterns. The data generation methodology use STLPy that takes as input system dynamics, specification description, actuation constraints and state cost functions to generate trajectories that satisfy the given specifications. The training objective combines Mean Absolute Error (MAE), Mean Squared Error (MSE), and a specification relevance loss to ensure precise state-action predictions and adherence to the specifications.

Qualitative Results: In Figure 2. we show trajectory prediction for an STL specification:

Eventually[0:30](inside_purple_region and Eventually(inside_green_region)) and Always[0:30] (outside_grey_obstacle).

Figure 2. PASTEL autoregressively predicts the safe trajectory with goal conditioning

In evaluations, PASTEL outperformed the baseline Perception-Action Causal Transformer (PACT), achieving higher satisfaction rates across various STL specifications. The results indicate the model’s ability to generate smooth, constraint-satisfying trajectories, highlighting its potential for real-world applications. Please refer to the paper for additional details.

PASTEL’s ability to integrate logical constraints with data-driven planning models, bridges the gap between precise behavior specifications and the flexibility of large pretrained models.

PASTEL is one step towards enhancing the safety and reliability of robots and autonomous systems, paving the way for their deployment in critical real-world applications. Future work aims to address the challenges associated with long-horizon and complex nested tasks by incorporating STL decomposition techniques and external feedback mechanisms.

This work appeared in the RSS 2024 Safe Autonomy Workshop and is joint work by Parv Kapoor, Sai Vemprala and Ashish Kapoor.