Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2 
Published in Advances in Neural Information Processing Systems (NeurIPS 2022), 2022
This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.
Recommended citation: Xinqi Wang, Qiwen Cui, Simon S. Du. (2022). "On Gap-dependent Bounds for Offline Reinforcement Learning." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/hash/5f5f7b6080dcadced61cf5d96f7c6dde-Abstract-Conference.html
Published in arXiv preprint, 2024
This paper investigates preference-based multi-agent reinforcement learning, focusing on identifying Nash equilibria from offline datasets with sparse human feedback, and introduces temporal MSE regularization and pessimism mechanisms for improved reward modeling.
Recommended citation: Xinqi Wang, Natalia Zhang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du. (2024). "Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques." arXiv:2409.00717. https://arxiv.org/abs/2409.00717
Published in Advances in Neural Information Processing Systems (NeurIPS 2024), 2024
This paper introduces DiSPOs, a novel approach that learns distributions of successor features from offline datasets to enable zero-shot policy optimization across different reward functions, avoiding compounding errors in model-based RL.
Recommended citation: Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta. (2024). "Distributional Successor Features Enable Zero-Shot Policy Optimization." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2024/hash/e15ef893e137cd40e6c7313a04307437-Abstract-Conference.html
Published in arXiv preprint, 2025
This paper introduces trajectory clustering for offline RL datasets where cluster centers represent generating policies, proposing Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) with finite-step convergence guarantees.
Recommended citation: Xinqi Wang, Hao Hu, Simon S. Du. (2025). "Policy-Based Trajectory Clustering in Offline Reinforcement Learning." arXiv:2506.09202. https://arxiv.org/abs/2506.09202
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.