I am a 3rd year PhD student in the ECE department at Northwestern University, advised by Qi Zhu. Before Northwestern, I did my undergrad in Applied Math and Computer Science at UC Berkeley, where I was advised by Sanjit A. Seshia. I have also spent great time as research intern at Amazon SFAI.
Research Statement
My research bridges reinforcement learning and formal methods to make embodied cyber-physical systems safe, robust, and verifiable. I combine formal verification with learning-based control to obtain machine-checkable guarantees without sacrificing adaptability or performance. Current threads include constraint-satisfaction–based safe RL and runtime certification; delay-aware decision making under uncertainty; model-based, certificate-driven learning; and neuro-symbolic techniques that couple logical reasoning with deep models for interpretable policies. This agenda targets trustworthy autonomy at scale and has led to publications in venues such as ICML, NeurIPS, ICCPS, L4DC, FM, RV, and IROS.
News
Selected Publications
 Qingyuan Wu*, Simon Zhan*, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang (* equal contribution)
 Variational Delayed Policy Optimization (VDPO) reformulates delayed RL as a variational inference problem, solved through a two-step iterative optimization process with TD learning and behavior cloning.
 Yixuan Wang, Simon Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu
 A safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization using soft barriers to enforce hard safety constraints.
 Yixuan Wang*, Simon Zhan*, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu (* equal contribution)
 A framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable through gradients from the value function and certificates formulated by linear programs and semi-definite programs.