|
Wenhao Zhan
Publications
(* = equal contribution, + = equal contribution and random order, # = equal contributions and ordered alphabetically)
K. Brantley, M. Chen#, Z. Gao#, J. D. Lee, W. Sun, W. Zhan#, X. Zhang, "Accelerating RL for LLM Reasoning with Optimal Advantage Regression", Neurips 2025.
W. Zhan, S. Fujimoto, Z. Zhu, J. D. Lee, D. R. Jiang, Y. Efroni, "Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank", ICLR 2025.
A. Huang, W. Zhan, T. Xie, J. D. Lee, W. Sun, A. Krishnamurthy, D. J. Foster, "Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-squared Preference Optimization", ICLR 2025 Spotlight.
Z. Gao, W. Zhan, J. D. Chang, G. Swamy, K. Brantley, J. D. Lee, W. Sun, "Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF", ICLR 2025.
J. D. Chang*, W. Zhan*, O. Oertell, K. Brantley, D. Misra, J. D. Lee, W. Sun, "Dataset Reset Policy Optimization for RLHF", Preprint.
Z. Gao, J. D. Chang, W. Zhan, O. Oertell, G. Swamy, K. Brantley, T. Joachims, J. A. Bagnell, J. D. Lee, W. Sun, "REBEL: Reinforcement Learning via Regressing Relative Rewards", Neurips 2024.
Z. Zhang, W. Zhan, Y. Chen, S. S. Du, J. D. Lee, "Optimal Multi-Distribution Learning", COLT 2024.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "Provable Reward-Agnostic Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
W. Zhan*, M. Uehara*, N. Kallus, J. D. Lee, W. Sun, "Provable Offline Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
Y. Zhao+, W. Zhan+, X. Hu+, H. Leung, F. Farnia, W. Sun, J. D. Lee, "Provably Efficient CVaR RL in Low-rank MDPs", ICLR 2024.
G. Li*, W. Zhan*, J. D. Lee, Y. Chi, Y. Chen, "Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning", Neurips 2023.
W. Zhan*, S. Cen*, B. Huang, Y. Chen, J. D. Lee, Y. Chi, "Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence", SIAM Journal on Optimization, 2023.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "PAC Reinforcement Learning for Predictive State Representations", ICLR 2023.
W. Zhan, J. D. Lee, Z. Yang, "Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games", ICLR 2023.
W. Zhan, B. Huang, A. Huang, N. Jiang, J. D. Lee, "Offline Reinforcement Learning with Realizability and Single-policy Concentrability", COLT 2022.
C. Z. Lee, L. P. Barnes, W. Zhan, A. Özgür, "Over-the-Air Statistical Estimation of Sparse Models", GLOBECOM 2021.
W. Zhan, H. Tang, J. Wang, "Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint", BMSB 2020.
Working
Mosaic AI, Databricks
Research Scientist
Jan 2026 –
Reinforcement Learning and Large Language Model Post-training
GenAI, Meta
Research Intern
Jun 2025 – Sep 2025
Reinforcement Learning for Tool-Integrated Reasoning Models
Ranking, Meta
Research Intern
May 2024 – Oct 2024
Efficient Multi-Agent Offline Reinforcement Learning
Teaching
Spring 2024: Foundations of Reinforcement Learning, as TA (Princeton, Instructor: Prof. Chi Jin).
Fall 2022: Theory of Weakly Supervised Learning, as TA (Princeton, Instructor: Prof. Jason D. Lee).
Honors
Talk
|