Michael S. Ryoo

SUNY Empire Innovation Associate Professor
AI Institute; Department of Computer Science
Stony Brook University

Robotics at Google
Google Brain

Instructions for students

As of September 2019, I joined the Department of Computer Science (CS) at Stony Brook University as an associate professor. I also am with Google Brain's "Robotics at Google" as a research scientist. Previously, I was an assistant professor at Indiana University Bloomington, and was a staff researcher within the Robotics Section of the NASA's Jet Propulsion Laboratory (JPL). I received my Ph.D. from the University of Texas at Austin in 2008 and B.S. from Korea Advanced Institute of Science and Technology (KAIST) in 2004.

Recent News
2022/11: Token Turing Machines
2022/11: Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
2022/10: A sim2real paper to appear at CoRL 2022:
Neural Neural Textures Make Sim2Real Consistent
2022/10: Two papers to appear at NeurIPS 2022:
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
2022/08: StARformer: Transformer with State-Action-Reward Representations and Video Question Answering with Iterative Video-Text Co-Tokenization was presented at ECCV 2022.
2022/03: Two papers at CVPR 2022. One paper as an oral and the other as a poster:
Self-supervised Video Transformer
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
2021/06: Check TokenLearner for images and videos! It learns to adaptively generate a small number of tokens for Transformers, providing better accuracies while also being faster. The paper also appeared at NeurIPS 2021.
2021/04: Recognizing Actions in Videos from Unseen Viewpoints and Coarse-Fine Networks for Temporal Activity Detection in Videos at CVPR 2021.
2021/04: Neural architecture search for robot reinforcement learning at ICRA 2021: Visionary: Vision Architecture Discovery for Robot Learning
2020/11: Published a new large-scale video dataset respecting diversity, privacy, and licenses at NeurIPS 2020:
AViD Dataset: Anonymized Videos from Diverse Countries. Dataset URL: [link].
2020/08: Four papers at ECCV 2020: on Adversarial Grammar, on Password-conditioned Face Anonymization, on AttentionNAS, and on AssembleNet++.
2020/03: Self-supervised video representation learning at CVPR 2020: Evolving Losses for Unsupervised Video Representation Learning
2019/10: Google AI blog article on Video Architecture Search. It summarizes our recent effort on neural architecture search for videos including AssembleNet at ICLR 2020 and EvaNet at ICCV 2019.

Curriculum Vitae pdf

Publications [by type] [by year]

List of selected recent publications

  • M. S. Ryoo, A. Piergiovanni, A. Arnab, M. Dehghani, and A. Angelova, "TokenLearner: Adaptive Space-Time Tokenization for Videos", NeurIPS 2021. [arXiv]
  • K. Kahatapitiya and M. S. Ryoo, "Coarse-Fine Networks for Temporal Activity Detection in Videos", CVPR 2021. [arXiv]
  • A. Piergiovanni and M. S. Ryoo, "Recognizing Actions in Videos from Unseen Viewpoints", CVPR 2021. [arXiv]
  • I. Akinola, A. Angelova, Y. Lu, Y. Chebotar, D. Kalashnikov, J. Varley, J. Ibarz, and M. S. Ryoo, "Visionary: Vision Architecture Discovery for Robot Learning", ICRA 2021. [arXiv]
  • A. Piergiovanni, A. Angelova, and M. S. Ryoo, "Evolving Losses for Unsupervised Video Representation Learning", CVPR 2020. [arXiv]
  • M. S. Ryoo, A. Piergiovanni, M. Tan, A. Angelova, "AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures", ICLR 2020. [arXiv]
  • A. Piergiovanni, A. Angelova, and M. S. Ryoo, "Differentiable Grammars for videos", AAAI 2020. [arXiv]
  • A. Piergiovanni, A. Wu, and M. S. Ryoo, "Learning Real-World Robot Policies by Dreaming", IROS 2019. [arXiv] [project]
  • A. Piergiovanni and M. S. Ryoo, "Temporal Gaussian Mixture Layer for Videos", ICML 2019. [arXiv] [github_code]
  • A. Piergiovanni and M. S. Ryoo, "Representation Flow for Action Recognition", CVPR 2019. [arXiv] [github_code]
  • Z. Ren, Y. J. Lee, and M. S. Ryoo, "Learning to Anonymize Faces for Privacy Preserving Action Detection", ECCV 2018. [arXiv] [project]
Google Scholar page: Google Scholar: Michael S. Ryoo


AViD dataset: Anonymized Videos from Diverse Countries.
MLB-YouTube dataset: an activity recognition dataset with over 42 hours of 2017 MLB post-season baseball videos.
JPL-Interaction dataset: a robot-centric first-person video dataset.
DogCentric Activity dataset: a first-person video dataset taken with dogs.
UT-Interaction dataset: a dataset containing continuous/segmented videos of human-human interactions.

Lab members

Alan Wu (Indiana University ISE)
Cristina Mata (Stony Brook University CS)
Kumara Kahatapitiya (Stony Brook University CS)
Jinghuan Shang (Stony Brook University CS)
Xiang Li (Stony Brook University CS)
Jongwoo Park (Stony Brook University CS)
Ryan Burgert (Stony Brook University CS)
Kanchana Ranasinghe (Stony Brook University CS)


Srijan Das (PostDoc 2022; joined UNC Charlotte)
AJ Piergiovanni (PhD 2020; joined Google Brain)


CSE525: Intro to Robotics (Spring 2022)
CSE527: Intro to Computer Vision (Fall 2021)
B457/I400: Intro to Computer Vision (Spring 2018)
B659/I590: Vision for Intelligent Robotics (Fall 2017)

Updated 10/2022