Summary: We present Memory-and-Anticipation Transformer (MAT), a novel memory-anticipation-based paradigm for online Action Understanding, to overcome the weakness of most existing methods that can only complete modeling temporal dependency within a limited historical context. Through extensive experiments on four challenging benchmarks across two tasks, we show its applicability in predicting present or future actions, obtaining state-of-the-art results, and demonstrating the importance of circular interaction between memory and anticipation in the entire temporal structure.
Abstract: Most existing forecasting systems are memory-based methods, which attempt to mimic human forecasting ability by employing various memory mechanisms and have progressed in temporal modeling for memory dependency. Nevertheless, an obvious weakness of this paradigm is that it can only model limited historical dependence and can not transcend the past. In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online Action Understanding tasks. In addition, owing to the inherent superiority of MAT, it can process online Action Understanding tasks in a unified manner. The proposed MAT model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and EPIC-Kitchens-100, for online action detection and anticipation tasks, and it significantly outperforms all existing methods.
[GitHub] |
|
Citation
|
|
@misc{wang2023memoryandanticipation, title={Memory-and-Anticipation Transformer for Online Action Understanding}, author={Jiahao Wang and Guo Chen and Yifei Huang and Limin Wang and Tong Lu}, year={2023}, eprint={2308.07893}, archivePrefix={arXiv}, primaryClass={cs.CV} } |