Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

portfolio

publications

On gap-dependent bounds for offline reinforcement learning

Published in Advances in Neural Information Processing Systems (NeurIPS 2022), 2022

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.

Recommended citation: Xinqi Wang, Qiwen Cui, Simon S. Du. (2022). "On Gap-dependent Bounds for Offline Reinforcement Learning." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/hash/5f5f7b6080dcadced61cf5d96f7c6dde-Abstract-Conference.html

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques

Published in arXiv preprint, 2024

This paper investigates preference-based multi-agent reinforcement learning, focusing on identifying Nash equilibria from offline datasets with sparse human feedback, and introduces temporal MSE regularization and pessimism mechanisms for improved reward modeling.

Recommended citation: Xinqi Wang, Natalia Zhang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du. (2024). "Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques." arXiv:2409.00717. https://arxiv.org/abs/2409.00717

Distributional Successor Features Enable Zero-Shot Policy Optimization

Published in Advances in Neural Information Processing Systems (NeurIPS 2024), 2024

This paper introduces DiSPOs, a novel approach that learns distributions of successor features from offline datasets to enable zero-shot policy optimization across different reward functions, avoiding compounding errors in model-based RL.

Recommended citation: Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta. (2024). "Distributional Successor Features Enable Zero-Shot Policy Optimization." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2024/hash/e15ef893e137cd40e6c7313a04307437-Abstract-Conference.html

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Published in arXiv preprint, 2025

This paper introduces trajectory clustering for offline RL datasets where cluster centers represent generating policies, proposing Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) with finite-step convergence guarantees.

Recommended citation: Xinqi Wang, Hao Hu, Simon S. Du. (2025). "Policy-Based Trajectory Clustering in Offline Reinforcement Learning." arXiv:2506.09202. https://arxiv.org/abs/2506.09202

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.