Michael S. Ryoo

SUNY Empire Innovation Associate Professor
Department of Computer Science; AI Institute
Stony Brook University

Principal Research Scientist
Salesforce AI Research


I recently joined the AI research team at Salesforce. Prior to that, I was with the robotics team at Google DeepMind (and formerly Google Brain) for 5.5 years. I also hold a tenured position in the Department of Computer Science (CS) at Stony Brook University as an associate professor. Previously, I was an assistant professor at Indiana University Bloomington, and was a staff researcher within the Robotics Section of the NASA's Jet Propulsion Laboratory (JPL). I received my Ph.D. from the University of Texas at Austin in 2008 and B.S. from Korea Advanced Institute of Science and Technology (KAIST) in 2004.

Recent News
2023/11: Mirasol3B is out! It's new a *video-based* multimodal language model across audio, video, and text modalities. It will be presented at CVPR 2024.
2023/09: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control at CoRL 2023!
2023/07: Diffusion Illusions: Hiding Images in Plain Sight received CVPR 2023 Outstanding Demo Award.

It is based on the same lossed used in Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
2023/07: RT-1: Robotics Transformer for Real-World Control at Scale at RSS 2023!
2023/06: Token Turing Machines, a new sequential model modernizing Neural Turing Machines was presented at CVPR 2023. [video]
2022/10: Neural Neural Textures Make Sim2Real Consistent at CoRL 2022
2022/10: Two papers at NeurIPS 2022:
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
2022/08: StARformer: Transformer with State-Action-Reward Representations and Video Question Answering with Iterative Video-Text Co-Tokenization were presented at ECCV 2022.
2022/03: Two papers at CVPR 2022. One paper as an oral and the other as a poster:
Self-supervised Video Transformer
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
2021/06: Check TokenLearner for images and videos! It learns to adaptively generate a small number of tokens for Transformers, providing better accuracies while also being faster. The paper also appeared at NeurIPS 2021.
2021/04: Recognizing Actions in Videos from Unseen Viewpoints and Coarse-Fine Networks for Temporal Activity Detection in Videos at CVPR 2021.
2021/04: Neural architecture search for robot reinforcement learning at ICRA 2021: Visionary: Vision Architecture Discovery for Robot Learning
2020/11: Published a new large-scale video dataset respecting diversity, privacy, and licenses at NeurIPS 2020:
AViD Dataset: Anonymized Videos from Diverse Countries. Dataset URL: [link].
2020/03: Self-supervised video representation learning at CVPR 2020: Evolving Losses for Unsupervised Video Representation Learning

Curriculum Vitae pdf

Publications [by type] [by year]

List of selected publications

Google Scholar: Michael S. Ryoo


AViD dataset: Anonymized Videos from Diverse Countries.
MLB-YouTube dataset: an activity recognition dataset with over 42 hours of 2017 MLB post-season baseball videos.
JPL-Interaction dataset: a robot-centric first-person video dataset.
DogCentric Activity dataset: a first-person video dataset taken with dogs.
UT-Interaction dataset: a dataset containing continuous/segmented videos of human-human interactions.

Lab members

Cristina Mata (Stony Brook University CS)
Kumara Kahatapitiya (Stony Brook University CS)
Jinghuan Shang (Stony Brook University CS)
Xiang Li (Stony Brook University CS)
Jongwoo Park (Stony Brook University CS)
Ryan Burgert (Stony Brook University CS)
Kanchana Ranasinghe (Stony Brook University CS)
Abe Leite (Stony Brook University CS)


Alan Wu (PhD 2023; joined MIT Lincoln Lab)
Srijan Das (PostDoc 2022; joined UNC Charlotte)
AJ Piergiovanni (PhD 2020; joined Google Brain)


CSE378: Intro to Robotics (Fall 2023)
CSE525: Robotics (Spring 2023)
CSE527: Intro to Computer Vision (Fall 2021)
B457/I400: Intro to Computer Vision (Spring 2018)
B659/I590: Vision for Intelligent Robotics (Fall 2017)

Updated 02/2024