Redundancy Reduction for Environment Representations
- Redundancy reduction between token sets: The first type reduces a variable-sized set of local road environment tokens into a fixed-sized global embedding.
- Redundancy reduction between embeddings: The second type learns augmentation-invariant features between embeddings generated from augmented views of road environments.
Similarity Maximization from Different Modalities
We evaluate JointMotion by pre-training and fine-tuning several models, including the Scene Transformer, Wayformer, and HPTR on the Waymo Open Motion Dataset and the Argoverse 2 Motion Forecasting dataset. We show that our pre-training approach significantly improves the accuracy of these models and enhances their generalizability across various datasets and environmental conditions.