I am a 2nd year PhD student of ECE department, Northwestern University advised by Qi Zhu, and I also work closely with Zhaoran Wang and Chao Huang.
Before Northwestern, I did my undergrad on Applied Math and Computer Science at UC Berkeley, where I was advised by Sanjit A. Seshia. I had experience on Ubiquitous Computing and Novel sensing and have been fortunately advised by Teng Han and Tian Feng.
I'm interested in combining techniques from machine learning, control theory, and formal method to enforce safety and robustness of various cyber-physics systems applications. I'm also broadly interested in Generative Models, Human Factor, and AI4Math (Theorem Proving in LEAN, Autoformalism, etc.).
Our project introduces a model-based adversarial Inverse Reinforcement Learning framework that enhances performance in stochastic environments by incorporating transition dynamics into reward shaping, significantly improving sample efficiency and robustness compared to traditional approaches.
Variational Delayed Policy Optimization (VDPO) reformulates delayed RL as a variational inference problem,
which is further modelled as a two-step iterative optimization problem, where the
first step is TD learning in the delay-free environment with a small state space, and
the second step is behaviour cloning which can be addressed much more efficiently
than TD learning.
Auxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliary short-delayed task to accelerate the learning on a long-delayed task without compromising the performance in stochastic environments.
In this paper, we propose a novel pixel-observation safe RL algorithm that efficiently encodes state-wise safety constraints with unknown hazard regions through the introduction of a latent barrier function learning mechanism.
A safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization.
A framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable by the gradients from the value function and certificates formulated by linear programs and semi-definite programs.
Tools
MARS: a toolchain for Modeling, Analyzing and veRifying hybrid Systems