CVPR 2018 Tutorial on

Human Activity Recognition


Tutorial Date

June 18th, 2018 (Location: Room 255 EF)


Michael Ryoo

(Indiana Univ.)

Greg Mori


Kris Kitani



In the recent years, the field of human activity recognition has grown dramatically, reflecting its importance in many high-impact societal applications including smart surveillance, web-video search and retrieval, quality-of-life devices for elderly people, and robot perception. With the initial success of convolutional network models to learn video representations, the field is gradually moving towards detecting and forecasting more complex human activities involving multiple people, objects, and sub-events in various realistic scenarios. New important research topics and problems are appearing as a consequence, including (i) reliable spatio-temporal localization of activities, (ii) end-to-end modeling of activities’ temporal structure and hierarchy, (iii) group activity recognition, (iv) activity forecasting, as well as (v) construction of large-scale datasets and convolutional models. The objective of this tutorial is to introduce and overview recent progress in these emerging topics, as well as to discuss, motivate and encourage future research in diverse subfields of activity recognition.


Joao Carreira is a research scientist at DeepMind. Previously he was a postdoc at UC Berkeley and before that he did his PhD at the University of Bonn. His interests are now mainly in video modeling and representation; before, he did some of the early work on object proposals, as well as venturing into class-specific reconstruction, human pose estimation and semantic segmentation. Joao is one of the creators of the Kinetics dataset and is the first author of the I3D video CNN model.

Gunnar A. Sigurdsson is a PhD candidate in the Robotics Institute at Carnegie Mellon University, advised by Abhinav Gupta. He is the main author of the Charades dataset released in 2016, designed for classification and detection of daily human activities in continuous videos.


1330   Introduction

1350   Spatio-Temporal Activity Detection, Greg Mori (SFU)

1410   Learning Temporal Hierarchy, Michael Ryoo (Indiana Univ.)

1440   Invited Talk: Kinetics 600, Joao Carreira (DeepMind)

1505   Invited Talk: Observing Humans in their Natural Habitat: Datasets & Models, Gunnar Sigurdsson (CMU)

1530   Coffee Break

1600   Group Activity Recognition, Greg Mori (SFU)

1620   Activity Forecasting, Nick Rhinehart (CMU)

1650   Emerging Topics (Including Privacy-Preservation), Michael Ryoo (Indiana Univ.)