Publications

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

Published in arXiv preprint, 2026

A benchmark evaluating AI agents on property-based testing—a skill distinct from general code generation. It comprises 100 curated problems across 40 Python libraries with 365 injected bugs that are deliberately difficult to trigger with random inputs, requiring agents to read documentation, identify invariants, and specify targeted Hypothesis strategies.

Recommended citation: Xinqi Wang*, Lucas Jing*, Liao Zhang, Simon S. Du (*equal contribution). (2026). "PBT-Bench: Benchmarking AI Agents on Property-Based Testing." arXiv:2605.15229. https://arxiv.org/abs/2605.15229

Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications

Published in arXiv preprint, 2026

A neuro-symbolic approach for automatically generating and validating formal specifications of memory-manipulating C functions: LLMs propose candidate specifications, which are iteratively refined and filtered using feedback from symbolic provers and proof-based refutation. Introduces the LeetCode-C-Spec benchmark.

Recommended citation: Liao Zhang, Tong Chen, Xiwei Wu, Qi Liu, Xiyu Zhai, Xinqi Wang, Qinxiang Cao. (2026). "Neuro-Symbolic Generation and Validation of Memory-Aware Formal Function Specifications." arXiv:2603.13414. https://arxiv.org/abs/2603.13414

Policy-Based Trajectory Clustering in Offline Reinforcement Learning

Published in Conference on Uncertainty in Artificial Intelligence (UAI 2026), 2025

This paper introduces trajectory clustering for offline RL datasets where cluster centers represent generating policies, proposing Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) with finite-step convergence guarantees.

Recommended citation: Xinqi Wang, Hao Hu, Simon S. Du. (2026). "Policy-Based Trajectory Clustering in Offline Reinforcement Learning." Conference on Uncertainty in Artificial Intelligence (UAI). https://arxiv.org/abs/2506.09202

Distributional Successor Features Enable Zero-Shot Policy Optimization

Published in Advances in Neural Information Processing Systems (NeurIPS 2024), 2024

This paper introduces DiSPOs, a novel approach that learns distributions of successor features from offline datasets to enable zero-shot policy optimization across different reward functions, avoiding compounding errors in model-based RL.

Recommended citation: Chuning Zhu, Xinqi Wang, Tyler Han, Simon S. Du, Abhishek Gupta. (2024). "Distributional Successor Features Enable Zero-Shot Policy Optimization." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2024/hash/e15ef893e137cd40e6c7313a04307437-Abstract-Conference.html

Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques

Published in arXiv preprint, 2024

This paper investigates preference-based multi-agent reinforcement learning, focusing on identifying Nash equilibria from offline datasets with sparse human feedback, and introduces temporal MSE regularization and pessimism mechanisms for improved reward modeling.

Recommended citation: Xinqi Wang, Natalia Zhang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du. (2024). "Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques." arXiv:2409.00717. https://arxiv.org/abs/2409.00717

On gap-dependent bounds for offline reinforcement learning

Published in Advances in Neural Information Processing Systems (NeurIPS 2022), 2022

This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning.

Recommended citation: Xinqi Wang, Qiwen Cui, Simon S. Du. (2022). "On Gap-dependent Bounds for Offline Reinforcement Learning." Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper_files/paper/2022/hash/5f5f7b6080dcadced61cf5d96f7c6dde-Abstract-Conference.html

Elliot Wang