연구 소개

  • 연구
  • 연구 소개

StARformer: Transformer with State-Action-Reward Representations for Robot Learning

  • AI융합대학
  • 2022-10-18

이유철 교수의 Robotics and AI Navigation (RAIN) 연구실에서 발표한 논문 "StARformer: Transformer with State-Action-Reward Representations for Robot Learning"이 "IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)"에 게재 승인되었다. 본 논문은 한국전자통신연구원(ETRI)과 미국 스토니브룩대학교(Stony Brook University)와 협업하여 강화 학습과 트랜스포머 모델을 이용한 종단간 모방학습 기반의 자율주행 기술에 관한 것으로, 이유철 교수는 이 논문의 교신 저자로 참여했다. EEE TPAMI는 세계 최고 권위의 인공지능 학술지 중 하나로, JCR 0.54%, Impact factor 24.314에 달한다. 본 논문은 2022년 12월에 출간될 예정이다. 

 

 

 

[Figure 1] StARformer System Architecture

  

 

Reinforcement Learning (RL) can be considered as a sequence modeling task, where an agent employs a sequence of past state-action-reward experiences to predict a sequence of future actions. In this work, we propose St ate- A ction- R eward Transformer ( StAR former), a Transformer architecture for robot learning with image inputs, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. StARformer first extracts StAR-representations using self-attending patches of image states, action, and reward tokens within a short temporal window. These StAR-representations are combined with pure image state representations, extracted as convolutional features, to perform self-attention over the whole sequence. Our experimental results show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, under both offline-RL and imitation learning settings. We find that models can benefit from our combination of patch-wise and convolutional image embeddings. StARformer is also more compliant with longer sequences of inputs than the baseline method. Finally, we demonstrate how StARformer can be successfully applied to a real-world robot imitation learning setting via a human-following task.

 

 

 


[Figure 2] Imitation Learning based Scenario Experiments to Evaluate the StARformer Navigation Performance

 

 

This study introduces StARformer, which explicitly models strong local relations (Step Transformer) to facilitate the long-term sequence modeling (Sequence Transformer) in Visual RL. Our extensive empirical results show how the learned StAR-representations help our model to outperform the baseline in both Atari and DMC environments, as well as both offline RL and imitation learning settings. We find that the fusion of learned StAR-representations and convolution features benefits action prediction. We further demonstrate that our designed architecture and token embeddings are essential to successfully model trajectories, with an emphasis on long sequences. We also verify that our method can be applied to real-world robot learning settings via a human-following experiment.