16 Nov 2017

Object Interactions for Fine-grained Video Understanding

Attend and Interact: Higher-Order ObjectInteractions for Video Understanding

Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Computer Vision and Pattern Recognition (CVPR), 2018
[arXiv] [ML@GT Blog] [Poster]


Human actions often involve complex interactions across several inter-related objects in the scene. However, existing approaches to fine-grained video understanding or visual relationship detection often rely on single object representation or pairwise object relationships.

In this paper, we propose to efficiently learn higher-order interactions between arbitrary subgroups of objects for fine-grained video understanding. We demonstrate the impact of modeling object interactions towards significantly improving accuracy for both action recognition and video captioning.

To the best of our knowledge, this is the first work modeling object interactions on open domain large-scale video datasets.

Fine-grained Human Activity Recognition

Relationship Grounded Video Captioning


If you find this work useful, please cite our paper:

  title={Attend and Interact: Higher-Order Object Interactions for Video Understanding},
  author={Ma, Chih-Yao and Kadav, Asim and Melvin, Iain and Kira, Zsolt and AlRegib, Ghassan and Graf, Hans Peter},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},