About

I am a 3rd year PhD student in the ECE department at Northwestern University, advised by Qi Zhu. Before Northwestern, I did my undergrad in Applied Math and Computer Science at UC Berkeley, where I was advised by Sanjit A. Seshia. I have also spent great time as research intern at Amazon SFAI.

Research Statement

My research bridges reinforcement learning and formal methods to make embodied cyber-physical systems safe, robust, and verifiable. I combine formal verification with learning-based control to obtain machine-checkable guarantees without sacrificing adaptability or performance. Current threads include constraint-satisfaction–based safe RL and runtime certification; delay-aware decision making under uncertainty; model-based, certificate-driven learning; and neuro-symbolic techniques that couple logical reasoning with deep models for interpretable policies. This agenda targets trustworthy autonomy at scale and has led to publications in venues such as ICML, NeurIPS, ICCPS, L4DC, FM, RV, and IROS.

News

[10/2025] 2 paper have been accepted to SEA@NeurIPS 2025.
[05/2025] I will start my summer internship at Amazon as Applied Scientist in Palo Alto office. Happy to connect!
[10/2024] Our paper "Variational Policy Delayed Optimization" is accepted to NeurIPS 2024 (Spotlight!)
[05/2024] I will attend SSFT 2024. Please feel free to reach out if you are interested in my research
[04/2024] Our paper "State-wise Safe RL With Pixel Observations" is accepted to L4DC 2024
[03/2024] One paper accepted to LLMAgent@ICLR 2024
[09/2023] I will start my PhD journey at ECE department, Northwestern University

Selected Publications

Qingyuan Wu*, Simon Zhan*, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang (* equal contribution)
Variational Delayed Policy Optimization (VDPO) reformulates delayed RL as a variational inference problem, solved through a two-step iterative optimization process with TD learning and behavior cloning.
Yixuan Wang, Simon Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu
A safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization using soft barriers to enforce hard safety constraints.
Yixuan Wang*, Simon Zhan*, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu (* equal contribution)
A framework that jointly conducts reinforcement learning and formal verification by formulating and solving a novel bilevel optimization problem, which is end-to-end differentiable through gradients from the value function and certificates formulated by linear programs and semi-definite programs.