Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping

Published in L4DC 2026, 2026

Adversarial inverse reinforcement learning (IRL) methods that use reward shaping often break down in stochastic environments. This work introduces a maximum causal entropy, transition-aware reward shaping approach that embeds learned transition dynamics into the reward, yielding stochastic-invariant rewards and theoretical bounds on reward error and performance differences. Experiments on MuJoCo locomotion and stochastic Atari tasks show stronger performance and improved sample efficiency compared to existing baselines, while remaining competitive in deterministic settings.

Authors: Simon Sinong Zhan, Philip Wang, Qingyuan Wu, Ruochen Jiao, Yixuan Wang, Chao Huang, Qi Zhu (equal contribution)

Citation

@article{zhan2024enhancing, title={Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping}, author={Zhan, Simon Sinong and Wang, Philip and Wu, Qingyuan and Jiao, Ruochen and Wang, Yixuan and Huang, Chao and Zhu, Qi}, journal={arXiv preprint arXiv:2410.03847}, year={2024} }