Wenhao Zhan

Wenhao Zhan
whzhan99@outlook.com
San Francisco, CA
Google Scholar and LinkedIn

I am a Research Scientist at Mosaic AI, Databricks. Previously, I was a Ph.D. student at Princeton University, where I was fortunate to be advised by Professor Jason D. Lee and Yuxin Chen. Before that, I received my Bachelor's Degree from Tsinghua University.

Research

Foundations and Applications of Reinforcement Learning
Large Language Model Post-training
Statistics and Optimization

Publications

(* = equal contribution, + = equal contribution and random order, # = equal contributions and ordered alphabetically)

K. Brantley, M. Chen#, Z. Gao#, J. D. Lee, W. Sun, W. Zhan#, X. Zhang, "Accelerating RL for LLM Reasoning with Optimal Advantage Regression", Neurips 2025.
W. Zhan, S. Fujimoto, Z. Zhu, J. D. Lee, D. R. Jiang, Y. Efroni, "Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank", ICLR 2025.
A. Huang, W. Zhan, T. Xie, J. D. Lee, W. Sun, A. Krishnamurthy, D. J. Foster, "Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-squared Preference Optimization", ICLR 2025 Spotlight.
Z. Gao, W. Zhan, J. D. Chang, G. Swamy, K. Brantley, J. D. Lee, W. Sun, "Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF", ICLR 2025.
J. D. Chang*, W. Zhan*, O. Oertell, K. Brantley, D. Misra, J. D. Lee, W. Sun, "Dataset Reset Policy Optimization for RLHF", Preprint.
Z. Gao, J. D. Chang, W. Zhan, O. Oertell, G. Swamy, K. Brantley, T. Joachims, J. A. Bagnell, J. D. Lee, W. Sun, "REBEL: Reinforcement Learning via Regressing Relative Rewards", Neurips 2024.
Z. Zhang, W. Zhan, Y. Chen, S. S. Du, J. D. Lee, "Optimal Multi-Distribution Learning", COLT 2024.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "Provable Reward-Agnostic Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
W. Zhan*, M. Uehara*, N. Kallus, J. D. Lee, W. Sun, "Provable Offline Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
Y. Zhao+, W. Zhan+, X. Hu+, H. Leung, F. Farnia, W. Sun, J. D. Lee, "Provably Efficient CVaR RL in Low-rank MDPs", ICLR 2024.
G. Li*, W. Zhan*, J. D. Lee, Y. Chi, Y. Chen, "Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning", Neurips 2023.
W. Zhan*, S. Cen*, B. Huang, Y. Chen, J. D. Lee, Y. Chi, "Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence", SIAM Journal on Optimization, 2023.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "PAC Reinforcement Learning for Predictive State Representations", ICLR 2023.
W. Zhan, J. D. Lee, Z. Yang, "Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games", ICLR 2023.
W. Zhan, B. Huang, A. Huang, N. Jiang, J. D. Lee, "Offline Reinforcement Learning with Realizability and Single-policy Concentrability", COLT 2022.
C. Z. Lee, L. P. Barnes, W. Zhan, A. Özgür, "Over-the-Air Statistical Estimation of Sparse Models", GLOBECOM 2021.
W. Zhan, H. Tang, J. Wang, "Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint", BMSB 2020.

Working

Mosaic AI, Databricks

Research Scientist
Jan 2026 –
Reinforcement Learning and Large Language Model Post-training

GenAI, Meta

Research Intern
Jun 2025 – Sep 2025
Reinforcement Learning for Tool-Integrated Reasoning Models

Ranking, Meta

Research Intern
May 2024 – Oct 2024
Efficient Multi-Agent Offline Reinforcement Learning

Teaching

Spring 2024: Foundations of Reinforcement Learning, as TA (Princeton, Instructor: Prof. Chi Jin).
Fall 2022: Theory of Weakly Supervised Learning, as TA (Princeton, Instructor: Prof. Jason D. Lee).

Honors

2024 Award for Excellence awarded by Princeton SEAS
Honorable mention for the 2023 Jane Street Graduate Research Fellowship

Talk

Optimal Multi-Distribution Learning
Adaptive Learning in Complex Environments, TTIC Chicago Summer Workshop 2024