Policy-Based Trajectory Clustering in Offline Reinforcement Learning
Published in arXiv preprint, 2025
This paper introduces trajectory clustering for offline RL datasets where cluster centers represent generating policies, proposing Policy-Guided K-means (PG-Kmeans) and Centroid-Attracted Autoencoder (CAAE) with finite-step convergence guarantees.
Recommended citation: Xinqi Wang, Hao Hu, Simon S. Du. (2025). "Policy-Based Trajectory Clustering in Offline Reinforcement Learning." arXiv:2506.09202. https://arxiv.org/abs/2506.09202
