Publications

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Published in arXiv preprint, 2025

This paper introduces trajectory clustering for offline RL datasets where cluster centers represent generating policies, proposing Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) with finite-step convergence guarantees.

Recommended citation: Xinqi Wang, Hao Hu, Simon S. Du. (2025). "Policy-Based Trajectory Clustering in Offline Reinforcement Learning." arXiv:2506.09202. https://arxiv.org/abs/2506.09202

Distributional Successor Features Enable Zero-Shot Policy Optimization

Published in Advances in Neural Information Processing Systems (NeurIPS 2024), 2024

This paper introduces DiSPOs, a novel approach that learns distributions of successor features from offline datasets to enable zero-shot policy optimization across different reward functions, avoiding compounding errors in model-based RL.

Recommended citation: Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta. (2024). "Distributional Successor Features Enable Zero-Shot Policy Optimization." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2024/hash/e15ef893e137cd40e6c7313a04307437-Abstract-Conference.html

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques

Published in arXiv preprint, 2024

This paper investigates preference-based multi-agent reinforcement learning, focusing on identifying Nash equilibria from offline datasets with sparse human feedback, and introduces temporal MSE regularization and pessimism mechanisms for improved reward modeling.

Recommended citation: Xinqi Wang, Natalia Zhang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du. (2024). "Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques." arXiv:2409.00717. https://arxiv.org/abs/2409.00717

On gap-dependent bounds for offline reinforcement learning

Published in Advances in Neural Information Processing Systems (NeurIPS 2022), 2022

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.

Recommended citation: Xinqi Wang, Qiwen Cui, Simon S. Du. (2022). "On Gap-dependent Bounds for Offline Reinforcement Learning." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/hash/5f5f7b6080dcadced61cf5d96f7c6dde-Abstract-Conference.html

Elliot Wang

Publications

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Distributional Successor Features Enable Zero-Shot Policy Optimization

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques

On gap-dependent bounds for offline reinforcement learning