Memory-and-Anticipation Transformer for Online Action Understanding
Memory-and-Anticipation Transformer for Online
Action Understanding

Jiahao Wang* Nanjing University

Guo Chen* Nanjing University

Yifei Huang Shanghai AI Laboratory

Limin Wang Nanjing University

Tong Lu Nanjing University

International Conference on Computer Vision (ICCV), 2023

[Paper]

[GitHub Code]

Summary: We present Memory-and-Anticipation Transformer (MAT), a novel memory-anticipation-based paradigm for online Action Understanding, to overcome the weakness of most existing methods that can only complete modeling temporal dependency within a limited historical context. Through extensive experiments on four challenging benchmarks across two tasks, we show its applicability in predicting present or future actions, obtaining state-of-the-art results, and demonstrating the importance of circular interaction between memory and anticipation in the entire temporal structure.

Abstract: Most existing forecasting systems are memory-based methods, which attempt to mimic human forecasting ability by employing various memory mechanisms and have progressed in temporal modeling for memory dependency. Nevertheless, an obvious weakness of this paradigm is that it can only model limited historical dependence and can not transcend the past. In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online Action Understanding tasks. In addition, owing to the inherent superiority of MAT, it can process online Action Understanding tasks in a unified manner. The proposed MAT model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and EPIC-Kitchens-100, for online action detection and anticipation tasks, and it significantly outperforms all existing methods.

Memory-and-Anticipation Transformer

Memory-and-Anticipation Transformer (MAT), a novel memory-anticipation-based approach that fully models the complete temporal context, including history, present, and future. A Progressive Memory Encoder is designed to provide a more precise history summary by compressing long- and short-term memory in a segment-based fashion. Meanwhile, we propose our key idea of modeling circular dependencies between memory and future, implemented as Memory-Anticipation Circular Decoder. It first learns latent future features in a supervised manner, then updates iteratively the enhanced short-term memory and the latent future features by performing Conditional circular Interaction between them. Among them, multiple interaction processes capture the circular dependency and supervise the output to maintain stable features with real semantics.

Details of the Architecture

Segment-based Long-term Memory Compression and Conditional Circular Interaction. MAT performs piecewise compression on long-term historical information to extract abstract features, while modeling circular dependencies among all information to form more robust semantic context.

Source Code

PyTorch code for our paper is open-source and available on GitHub. We include a efficient, pure Python implementation of MAT, as well as training and evaluation code.

[GitHub]

Paper and Bibtex

[Paper]

Citation

Jiahao Wang, Guo Chen, Yifei Huang, Limin Wang, Tong Lu.
Memory-and-Anticipation Transformer for Online Action Understanding.
In International Conference on Computer Vision (ICCV), 2023.

@misc{wang2023memoryandanticipation,
      title={Memory-and-Anticipation Transformer for 
      Online Action Understanding}, 
      author={Jiahao Wang and Guo Chen and 
      Yifei Huang and Limin Wang and Tong Lu},
      year={2023},
      eprint={2308.07893},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Website source