Publications

For detailed full list of my articles, please visit my Google Scholar profile.

Conference Papers

Directly Forecasting Belief for Reinforcement Learning with Delays

2025
International Conference on Machine Learning (ICML)
Qingyuan Wu*, Yuhui Wang*, Simon Sinong Zhan*, Yixuan Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang (*equal contribution)
This paper presents a novel approach to directly forecast beliefs in reinforcement learning with observation delays, improving upon traditional methods by incorporating predictive capabilities into the learning process.

Variational Delayed Policy Optimization

2024
Conference on Neural Information Processing Systems (NeurIPS)
Qingyuan Wu*, Simon Sinong Zhan*, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang (*equal contribution)
Variational Delayed Policy Optimization (VDPO) reformulates delayed RL as a variational inference problem, which is further modelled as a two-step iterative optimization problem, where the first step is TD learning in the delay-free environment with a small state space, and the second step is behaviour cloning which can be addressed much more efficiently than TD learning.

Kinematics-aware Trajectory Generation and Prediction with Latent SDE

2024
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Ruochen Jiao*, Yixuan Wang*, Xiangguo Liu, Simon Sinong Zhan, Chao Huang, Qi Zhu (*equal contribution)
This paper presents a novel approach to trajectory generation and prediction that incorporates kinematic constraints through latent stochastic differential equations, enabling more realistic and physically-consistent motion planning.

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

2024
International Conference on Machine Learning (ICML)
Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang
Auxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliary short-delayed task to accelerate the learning on a strongly delayed task without compromising the performance in stochastic environments.

State-wise Safe Reinforcement Learning With Pixel Observations

2024
Learning for Dynamics and Control Conference (L4DC)
Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu
In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through the introduction of a latent barrier function learning mechanism.

Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments

2023
International Conference on Machine Learning (ICML)
Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu
A safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization.

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

2023
ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS)
Yixuan Wang*, Simon Sinong Zhan*, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu (*equal contribution)
A framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable by the gradients from the value function and certificates formulated by linear programs and semi-definite programs.

MicroFluID - A Reconfigurable RFID Platform for Robust Interaction Sensing Based on Microfluidics

2022
ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)
Wei Sun, Yuwen Chen, Yanjun Chen, Xiaopeng Zhang, Simon Zhan, Yixin Li, Jiecheng Wu, Teng Han, Haipeng Mi, Jingxian Wang, Feng Tian, Xing-Dong Yang
MicroFluID is a novel RFID artifact based on a multiple-chip structure and microfluidic switches, which informs the input state by directly reading variable ID information instead of retrieving primitive signals.

RElectrode: A Reconfigurable Electrode For Multi-Purpose Sensing Based on Microfluidics

2021
ACM Conference on Human Factors in Computing Systems (CHI)
Wei Sun, Yanjun Chen, Simon Zhan, Teng Han, Feng Tian, Hongan Wang, Xing-Dong Yang
RElectrode is a reconfigurable electrode using a microfluidic technique that can change the geometry and material properties of the electrode to satisfy the needs for sensing a variety of different types of user input through touch/touchless gestures, pressure, temperature, and distinguish between different types of objects or liquids.

Workshop Papers

Shop-R1: Rewarding LLMs to Simulate Human Behavior in Online Shopping via Reinforcement Learning

2025
Scaling Environments for Agents (SEA) Workshop at NeurIPS 2025
Yimeng Zhang, Tian Wang, Jiri Gesi, Ziyi Wang, Yuxuan Lu, Jiacheng Lin, Sinong Zhan, Vianne Gao, Ruochen Jiao, Junze Liu, Kun Qian, Yuxin Tang, Ran Xue, Houyu Zhang, Qingjun Cui, Yufan Guo, Dakuo Wang
This paper introduces Shop-R1, a novel reinforcement learning framework aimed at enhancing the reasoning ability of LLMs for simulation of real human behavior in online shopping environments through a two-stage approach with distinct reward signals.

Empowering Autonomous Driving with Large Language Models: A Safety Perspective

2024
LLMAgent Workshop at ICLR 2024
Yixuan Wang, Ruochen Jiao, Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu
This paper explores the integration of Large Language Models (LLMs) into autonomous driving systems, leveraging their robust common-sense knowledge and reasoning abilities to enhance driving performance and safety in long-tail unforeseen scenarios.