Wenhao Zhan
Research
My research interests include
Reinforcement Learning
Statistics
Publications
(* = equal contribution, + = equal contribution and random order)
W. Zhan, S. Fujimoto, Z. Zhu, J. D. Lee, D. R. Jiang, Y. Efroni, "Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank", Preprint.
A. Huang, W. Zhan, T. Xie, J. D. Lee, W. Sun, A. Krishnamurthy, D. J. Foster, "Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-squared Preference Optimization", Preprint.
Z. Gao, W. Zhan, J. D. Chang, G. Swamy, K. Brantley, J. D. Lee, W. Sun, "Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF", Preprint.
J. D. Chang*, W. Zhan*, O. Oertell, K. Brantley, D. Misra, J. D. Lee, W. Sun, "Dataset Reset Policy Optimization for RLHF", Preprint.
Z. Gao, J. D. Chang, W. Zhan, O. Oertell, G. Swamy, K. Brantley, T. Joachims, J. A. Bagnell, J. D. Lee, W. Sun, "REBEL: Reinforcement Learning via Regressing Relative Rewards", Accepted by Neurips 2024.
Z. Zhang, W. Zhan, Y. Chen, S. S. Du, J. D. Lee, "Optimal Multi-Distribution Learning", COLT 2024.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "Provable Reward-Agnostic Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
W. Zhan*, M. Uehara*, N. Kallus, J. D. Lee, W. Sun, "Provable Offline Preference-Based Reinforcement Learning", ICLR 2024 Spotlight.
Y. Zhao+, W. Zhan+, X. Hu+, H. Leung, F. Farnia, W. Sun, J. D. Lee, "Provably Efficient CVaR RL in Low-rank MDPs", ICLR 2024.
G. Li*, W. Zhan*, J. D. Lee, Y. Chi, Y. Chen, "Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning", Neurips 2023.
W. Zhan*, S. Cen*, B. Huang, Y. Chen, J. D. Lee, Y. Chi, "Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence", SIAM Journal on Optimization, 2023.
W. Zhan, M. Uehara, W. Sun, J. D. Lee, "PAC Reinforcement Learning for Predictive State Representations", ICLR 2023.
W. Zhan, J. D. Lee, Z. Yang, "Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games", ICLR 2023.
W. Zhan, B. Huang, A. Huang, N. Jiang, J. D. Lee, "Offline Reinforcement Learning with Realizability and Single-policy Concentrability", COLT 2022.
C. Z. Lee, L. P. Barnes, W. Zhan, A. Özgür, "Over-the-Air Statistical Estimation of Sparse Models", GLOBECOM 2021.
W. Zhan, H. Tang, J. Wang, "Delay Optimal Cross-Layer Scheduling Over Markov Channels with Power Constraint", BMSB 2020.
Working
Meta
Research Intern
Jun 2024 – Sep 2024
Efficient Multi-Agent Offline Reinforcement Learning
Teaching
Spring 2024: Foundations of Reinforcement Learning, as TA (Princeton, Instructor: Prof. Chi Jin).
Fall 2022: Theory of Weakly Supervised Learning, as TA (Princeton, Instructor: Prof. Jason D. Lee).
Honors
|