CVPR 2014 Tutorial on 

Emerging Topics in Human Activity Recognition


Tutorial Date

June 23rd, 2014 (Location: Room C211)


Michael Ryoo


Ivan Laptev


Greg Mori


Sangmin Oh



In the past 5 years, the field of human activity recognition has grown dramatically, reflecting its importance in many high-impact societal applications including smart surveillance, web-video search and retrieval, quality-of-life devices for elderly people, and human-computer interfaces. Given the initial success of bag-of-words methods for action classification, the field is gradually moving towards more structured interpretation of complex human activities involving multiple people and objects as well as interactions among them in various realistic scenarios. New important research topics and problems are appearing as a consequence, including (i) modeling temporal structure of activities, (ii) learning relations between actions and objects/scenes/social roles, (iii) group activity recognition, and  (iv) first-person activity recognition. The objective of this tutorial is to introduce and overview recent progress in these emerging topics, as well as to discuss, motivate and encourage future research in diverse subfields of action recognition.

SCHEDULE (tentative)

Organizers of the tutorial will offer a sequence of lectures on active and emerging topics in activity recognition. Starting with the general motivation, history overview and basic bag-of-words techniques, we will next present advances in several subproblems of action recognition. In particular, we will cover (i) modeling spatio-temporal structure of actions (I. Laptev), (ii) group activity recognition (G. Mori), (iii) activity recognition from the first-person view (M. Ryoo), and (iv) real-world applications of activity recognition (S. Oh) .



Relevant publications of organizers


8:30 am

1. Introduction

Speaker: Michael Ryoo, ...

  • Introduction to human activity recognition
  • Applications and challenges
  • History of activity recognition
  • Dimensions in human activity recognition: types of videos, levels of human activities, and structure complexity



8:45 am

1.1 Action recognition with bag-of-features

Speaker: Ivan Laptev

  • Spatio-temporal features
  • Bag-of-words action recognition
  • Recent results and benchmarks


9:00 am

2. Beyond bag-of-features

Speaker: Ivan Laptev

  • Spatio-temporal structure of simple actions
  • Temporal structure of composite activities
  • Weakly-supervised learning



9:30 am

3. Group activity recognition

Speaker: Greg Mori

  • Human-human interactions
  • Human-object interactions
  • Social role analysis in video
  • Person context for activity recognition



10:15 am

Coffee break

10:30 am

4. First-person activity recognition

Speaker: Michael Ryoo

  • 3rd-person vs. 1st-person videos
  • Ego-action recognition and objects in first-person videos
  • First-person interaction recognition
  • Features for first-person activity recognition
  • ‘Ego’centric videos?



11:15 am

5. Real-world applications of activity recognition

Speaker: Sangmin Oh

  • Large Scale Unconstrained Video Analysis and Retrieval
  • Sports Video analysis
  • Action Recognition for Interactive Systems (Games etc)
  • Traffic Analysis and Surveillance



12:00 pm

5. Discussions and directions

  • Human activity prediction (i.e., early recognition)
  • Action vocabulary
  • Summary and closing





  1. J. K. Aggarwal and M. S. Ryoo, "Human Activity Analysis: A Review", ACM Computing Surveys, 43(3), April 2011.
  2. M. S. Ryoo and J. K. Aggarwal, "Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities", ICCV 2009.
  3. M. S. Ryoo and J. K. Aggarwal, "Stochastic Representation and Recognition of High-level Group Activities", International Journal of Computer Vision, 93(2):183-200, June 2011.
  4. M. S. Ryoo and L. Matthies, "First-Person Activity Recognition: What Are They Doing to Me?", CVPR 2013.
  5. M. S. Ryoo, S. Choi+, J. H. Joung+, J.-Y. Lee+, W. Yu, "Personal Driving Diary: Automated Recognition of Driving Events from First-Person Videos", Computer Vision and Image Understanding (CVIU), 2013 (+indicates equal contribution).
  6. M. S. Ryoo, "Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos", ICCV 2011.
  7. T. Lan, L. Sigal, and G. Mori. “Social Roles in Hierarchical Models for Human Activity Recognition”, IEEE Computer Vision and Pattern Recognition (CVPR), 2012.
  8. T. Lan, Y. Wang, W. Yang, S. Robinovitch, and G. Mori, “Discriminative Latent Models for Recognizing Contextual Group Activities”, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 34(8) pp.1549-1562, 2012.
  9. A. Vahdat, B. Gao, M. Ranjbar, and G. Mori, “A Discriminative Key Pose Sequence Model for Recognizing Human Interactions”, Eleventh IEEE International Workshop on Visual Surveillance, 2011.
  10. Arash Vahdat, Kevin Cannons, Greg Mori, Sangmin Oh, Ilseo Kim, “Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach”, in ICCV ‘13
  11. Ilseo Kim, Sangmin Oh, Arash Vahdat, Kevin Cannons, Amitha Perera, Greg Mori, “Segmental Multi-way Local Pooling for Video Recognition”, in ACM Multimedia ‘13
  12. Sangmin Oh, Scott McCloskey, Ilseo Kim, Arash Vahdat, Kevin Cannons, Hossein Hajimirsadeghi, Greg Mori, Amitha A.G. Perera, Megha Pandey, and Jason Corso., “Multimedia Event Detection with Multimodal Feature Fusion and Temporal Concept Localization”, Machine Vision and Application ‘13 Special Issue on Multimedia Event Detection
  13. Yiliang Xu, Sangmin Oh, Fan Yang, Zhuolin Jiang, Naresh Cuntoor, Anthony Hoogs, and Larry Davis. “System and Algorithms on Detection of Objects Embedded in Perspective Geometry using Monocular Cameras”, in AVSS ‘13
  14. Sangmin Oh*, Kang Li*, A.G. Amitha Perera, Yun Fu, “A Videography Analysis Framework for Video Retrieval and Summarization”, in BMVC ‘12
  15. Sangmin Oh et al., “A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video”, in CVPR ‘11
  16. Sangmin Oh, Anthony Hoogs, Matt Turek, Roderic Collins, “Content-based Retrieval of Functional Objects in Video using Scene Context”, in ECCV ‘10
  17. Karthir Prabhakar, Sangmin Oh, Ping Wang, Gregory D. Abowd,  James M. Rehg, “Temporal Causality for the Analysis of Visual Events”, in CVPR ‘10.
  18. P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid and J. Sivic, "Finding Actors and Actions in Movies", in Proc. ICCV'13, Sydney, Australia.
  19. V. Delaitre, D.F. Fouhey, I. Laptev, J. Sivic, A. Gupta and A.A. Efros, "Scene semantics from long-term observation of people", in Proc. ECCV'12, Florence, Italy.
  20. D.F. Fouhey, V. Delaitre, A. Gupta, A.A. Efros, I. Laptev and J. Sivic, "People Watching: Human Actions as a Cue for Single-View Geometry", in Proc. ECCV'12, Florence, Italy.
  21. M.M. Ullah, S.N. Parizi and I. Laptev, "Improving Bag-of-Features Action Recognition with Non-local Cues", in Proc. BMVC'10, Aberystwyth, UK.
  22. O. Duchenne, I. Laptev, J. Sivic, F. Bach and J. Ponce, "Automatic Annotation of Human Actions in Video", in Proc. ICCV'09, Kyoto, Japan.
  23. M. Marszałek, I. Laptev and C. Schmid, "Actions in Context", in Proc. CVPR'09, Miami, US.
  24. I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld, "Learning realistic human actions from movies", in Proc. CVPR'08, Anchorage, US.
  25. I. Laptev and P. Pérez, "Retrieving actions in movies", in Proc. ICCV'07, Rio de Janeiro, Brazil.
  26. I. Laptev, "On Space-Time Interest Points", in IJCV, vol 64, number 2/3, pp.107-123.
  27. Y. Iwashita, A. Takamine, R. Kurazume, and M. S. Ryoo, "First-Person Animal Activity Recognition from Egocentric Videos", ICPR 2014.