Michael Ryoo : M. S. Ryoo

Michael S. Ryoo: Publications by type [by year]

arXiv

K. Ranasinghe, X. Li, E. Nguyen, C. Mata, J. Park, M. S. Ryoo, "Pixel Motion as Universal Representation for Robot Control", arXiv:2505.07817. arXiv:2505 project
A. Piergiovanni, D. Kim, M. S. Ryoo, I. Noble, A. Angelova, "What’s in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning", arXiv:2411.14688. arXiv:2411
M. S. Ryoo, H. Zhou, S. Kendre, C. Qin, L. Xue, M. Shu, S. Savarese, R. Xu, C. Xiong, J. C. Niebles, "xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs", arXiv:2410.16267. arXiv:2410 project
R. Burgert, K. Ranasinghe, X. Li, M. S. Ryoo, "Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors", arXiv:2211.13224. arXiv:2211

Conference publications

E. Nguyen, Y. Zhang, K. Ranasinghe, X. Li, M. S. Ryoo, "Pixel Motion Diffusion is What We Need for Robot Control", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2026. arXiv:2509 project
R. D .Burgert, C. Herrmann, F. Cole, M. S. Ryoo, N. Wadhwa, A. Voynov, N. Ruiz , "MotionV2V: Editing Motion in a Video", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2026. arXiv:2511 project
K. Ranasinghe, H. Zhou, Y. Fang, L. Yang, L. Xue, R. Xu, C. Xiong, S. Savarese, M. S. Ryoo, J. C. Niebles, "Future Optical Flow Prediction Improves Robot Control and Video Generation", Findings of CVPR, June 2026. arXiv:2601
Z. Wang, H. Zhou, S. Wang, J. Li, C. Xiong, S. Savarese, M. Bansal, M. S. Ryoo, J. C. Niebles, "Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding", Findings of CVPR, June 2026. arXiv:2512
J. Park, K. Ranasinghe, J. Jang, C. Mata, Y. S. Jang, M. S. Ryoo, "IVRA: Improving Visual-Token Relations for Robot Action Policy with Training-Free Hint-Based Guidance", IEEE International Conference on Robotics and Automation (ICRA), May 2026. arXiv:2601
J. Park, K. Ranasinghe, K. Kahatapitiya, W. Ryoo, D. Kim, M. S. Ryoo, "Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA", the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL), March 2026. arXiv:2406
K. Kahatapitiya, H. Liu, S. He, D. Liu, M. Jia, C. Zhang, M. S. Ryoo, T. Xie, "Adaptive Caching for Faster Video Generation with Diffusion Transformers", IEEE/CVF International Conference on Computer Vision (ICCV), October 2025. arXiv:2411 project
L. Xue, M. Shu, A. Awadalla, J. Wang, A. Yan, S. Purushwalkam, H. Zhou, V. Prabhu, Y. Dai, M. S. Ryoo, S. Kendre, J. Zhang, C. Qin, S. Zhang, C.-C. Chen, N. Yu, J. Tan, T. M. Awalgaonkar, S. Heinecke, H. Wang, Y. Choi, L. Schmidt, Z. Chen, S. Savarese, J. C. Niebles, C. Xiong, R. Xu, "BLIP-3: A Family of Open Large Multimodal Models", Workshop on Findings of ICCV 2025. arXiv:2408
K. Kahatapitiya, K. Ranasinghe, J. Park, M. S. Ryoo, "Language Repository for Long Video Understanding", Findings of the Association for Computational Linguistics (ACL), July 2025. arXiv:2403 project
R. Burgert, Y. Xu, W. Xian, O. Pilarski, P. Clausen, M. He, L. Ma, Y. Deng, L. Li, M. Mousavi, M. S. Ryoo, P. Debevec, N. Yu, "Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2025. arXiv:2501 project
X. Li, C. Mata, J. Park, K. Kahatapitiya, Y. S. Jang, J. Shang, K. Ranasinghe, R. Burgert, M. Cai, Y. J. Lee, M. S. Ryoo, "LLaRA: Supercharging Robot Learning Data for Vision-Language Policy", International Conference on Learning Representations (ICLR), April 2025. arXiv:2406 project
K. Ranasinghe, X. Li, K. Kahatapitiya, M. S. Ryoo, "Understanding Long Videos in One Multimodal Language Model Pass", International Conference on Learning Representations (ICLR), April 2025. arXiv:2403 project
C. Mata, K. Ranasinghe, and M. S. Ryoo, "CoPT: Unsupervised Domain Adaptive Segmentation using DomainAgnostic Text Embeddings", European Conference on Computer Vision (ECCV), September 2024. pdf
C. Qin, C. Xia, K. Ramakrishnan, M. Ryoo, L. Tu, Y. Feng, M. Shu, H. Zhou, A. Awadalla, J. Wang, S. Purushwalkam, L. Xue, Y. Zhou, H. Wang, S. Savarese, J. C. Niebles, Z. Chen, R. Xu, C. Xiong, "xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations", ECCV 2024 Workshop on AI4VA, September 2024 arXiv:2408
R. Burgert, X. Li, A. Leite, K. Ranasinghe, M. S. Ryoo, "Diffusion Illusions: Hiding Images in Plain Sight", SIGGRAPH 2024. arXiv:2312. project
[Outstanding Demo Award from CVPR 2023]
A. Piergiovanni, I. Noble, D. Kim, M. S. Ryoo, V. Gomes, A. Angelova, "Mirasol3B: A Multimodal Autoregressive Model for Time-aligned and Contextual Modalities", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024. arXiv:2311 article
K. Kahatapitiya, A. Arnab, A. Nagrani, M. S. Ryoo, "VicTR: Video-conditioned Text Representations for Activity Recognition", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024. arXiv:2304
K. Ranasinghe, S. N. Shukla, O. Poursaeed, M. S. Ryoo, T.-Y. Lin, "Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.
R. D. Burgert, B. L. Price, J. Kuen, Y. Li, M. S. Ryoo, "MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024.
X. Li, V. Belagali, J. Shang, M. S. Ryoo, "Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning", IEEE International Conference on Robotics and Automation (ICRA), May 2024. arXiv:2307 project
I. Leal, K. Choromanski, D. Jain, A. Dubey, J. Varley, M. Ryoo, Y. Lu, F. Liu, V. Sindhwani, Q. Vuong, T. Sarlos, K. Oslund, K. Hausman, K. Rao, "SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention", IEEE International Conference on Robotics and Automation (ICRA), May 2024. arXiv:2312 article
[Best Paper Award in Robot Manipulation]
S. Das, T. Jain, D. Reilly, P. Balaji, S. Karmakar, S. Marjit, X. Li, A. Das, M. S. Ryoo, "Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders", IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2024. arXiv:2310
J. Park, K. Kahatapitiya, D. Kim, S. Sudalairaj, Q. Fan, M. S. Ryoo, "Grafting Vision Transformers", IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2024. arXiv:2210
K. Ranasinghe and M. S. Ryoo, "Language-based Action Concept Spaces Improve Video Self-Supervised Learning", Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), December 2023. arXiv:2307
J. Shang and M. S. Ryoo, "Active Vision Reinforcement Learning under Limited Visual Observability", Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), December 2023. project
R. Dai, S. Das, M. S. Ryoo, F. Bremond, "AAN: Attributes-Aware Network for Temporal Action Detection", British Machine Vision Conference (BMVC), November 2023. arXiv:2309
A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen, K. Choromanski, T. Ding, D. Driess, A. Dubey, C. Finn, Pete Florence, C. Fu, M. Gonzalez Arenas, K. Gopalakrishnan, K. Han, K. Hausman, A. Herzog, J. Hsu, B. Ichter, A. Irpan, N. Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, L. Lee, T. E. Lee, S. Levine, Y. Lu, H. Michalewski, I. Mordatch, K. Pertsch, K. Rao, K. Reymann, M. Ryoo, G. Salazar, P. Sanketi, P. Sermanet, J. Singh, A. Singh, R. Soricut, H. Tran, V. Vanhoucke, Q. Vuong, A. Wahid, S. Welker, P. Wohlhart, J. Wu, F. Xia, T. Xiao, P. Xu, S. Xu, T. Yu, B. Zitkovich, "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control", Conference on Robot Learning (CoRL), November 2023. arXiv:2307 project
K. Kahatapitiya and M. S. Ryoo, "SWAT: Spatial Structure Within and Among Tokens", the 32nd International Joint Conference on Artificial Intelligence (IJCAI), August 2023. arXiv:2111
R. Hadidi, J. Cao, M. S. Ryoo, H. Kim, "Reducing Inference Latency with Concurrent Architectures for Image Recognition at Edge", IEEE International Conference on Edge Computing and Communications (EDGE), July 2023. arXiv:2011
A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, J. Dabis, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, J. Hsu, J. Ibarz, B. Ichter, A. Irpan, T. Jackson, S. Jesmonth, N. J Joshi, R. Julian, D. Kalashnikov, Y. Kuang, I. Leal, K. Lee, S. Levine, Y. Lu, U. Malla, D. Manjunath, I. Mordatch, O. Nachum, C. Parada, J. Peralta, E. Perez, K. Pertsch, J. Quiambao, K. Rao, M. Ryoo, G. Salazar, P. Sanketi, K. Sayed, J. Singh, S. Sontakke, A. Stone, C. Tan, H. Tran, V. Vanhoucke, S. Vega, Q. Vuong, F. Xia, T. Xiao, P. Xu, S. Xu, T. Yu, B. Zitkovich, "RT-1: Robotics Transformer for Real-World Control at Scale", Robotics: Science and Systems (RSS), July 2023. arXiv:2212 project
M. S. Ryoo, K. Gopalakrishnan, K. Kahatapitiya, T. Xiao, K. Rao, A. Stone, Y. Lu, J. Ibarz, A. Arnab, "Token Turing Machines", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023. arXiv:2211
B. Chen, F. Xia, B. Ichter, K. Rao, K. Gopalakrishnan, M. S. Ryoo, A. Stone, D. Kappler, "Open-vocabulary Queryable Scene Representations for Real World Planning", IEEE International Conference on Robotics and Automation (ICRA), May 2023. arXiv:2209
A. Wu and M. S. Ryoo, "Energy-Based Models for Cross-Modal Localization using Convolutional Transformers", IEEE International Conference on Robotics and Automation (ICRA), May 2023.
A. Zeng, M. Attarian, B. Ichter, K. Choromanski, A. Wong, S. Welker, F. Tombari, A. Purohit, M. Ryoo, V. Sindhwani, J. Lee, V. Vanhoucke, P. Florence, "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", International Conference on Learning Representations (ICLR), May 2023. arXiv:2204 project
K. Kahatapitiya, Z. Ren, H. Li, Z. Wu, M. S. Ryoo, G. Hua, "Weakly-Guided Self-Supervised Pretraining for Temporal Activity Detection", AAAI Conference on Artificial Intelligence (AAAI), February 2023. arXiv:2111
S. Das and M. S. Ryoo, "ViewCLR: Learning Self-Supervised Video Representation for Unseen Viewpoints", IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2023. arXiv:2112
R. Burgert, J. Shang, X. Li, and M. S. Ryoo, "Neural Neural Textures Make Sim2Real Consistent", Conference on Robot Learning (CoRL), December 2022. arXiv:2206 project
J. Shang, S. Das, and M. S. Ryoo, "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space", Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), December 2022. arXiv:2206 project
X. Li, J. Shang, S. Das, and M. S. Ryoo, "Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?", Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS), December 2022. arXiv:2206
J. Shang, K. Kahatapitiya, X. Li, and M. S. Ryoo, "StARformer: Transformer with State-Action-Reward Representations", European Conference on Computer Vision (ECCV), October 2022. arXiv:2110
A. Piergiovanni, K. Morton, W. Kuo, M. S. Ryoo, and A. Angelova, "Video Question Answering with Iterative Video-Text Co-Tokenization", European Conference on Computer Vision (ECCV), October 2022. arXiv:2208
K. Ranasinghe, M. Naseer, S. Khan, F. S. Khan, and M. S. Ryoo, "Self-supervised Video Transformer", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. arXiv:2112
R. Dai, S. Das, K. Kahatapitiya, M. S. Ryoo, and F. Bremond, "MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022. arXiv:2112
M. S. Ryoo, A. Piergiovanni, A. Arnab, M. Dehghani, and A. Angelova, "TokenLearner: Adaptive Space-Time Tokenization for Videos", Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), December 2021. arXiv:2106
A. Piergiovanni, V. Casser, M. S. Ryoo, and A. Angelova, "4D-Net for Learned Multi-Modal Alignment", International Conference on Computer Vision (ICCV), October 2021. arXiv:2109
J. Shang and M. S. Ryoo, "Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2021. arXiv:2108
K. Kahatapitiya and M. S. Ryoo, "Coarse-Fine Networks for Temporal Activity Detection in Videos", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. arXiv:2103
A. Piergiovanni and M. S. Ryoo, "Recognizing Actions in Videos from Unseen Viewpoints", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021. arXiv:2103
I. Akinola, A. Angelova, Y. Lu, Y. Chebotar, D. Kalashnikov, J. Varley, J. Ibarz, and M. S. Ryoo, "Visionary: Vision Architecture Discovery for Robot Learning", IEEE International Conference on Robotics and Automation (ICRA), May 2021. arXiv:2103
A. Piergiovanni and M. S. Ryoo, "AViD Dataset: Anonymized Videos from Diverse Countries", Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2020. arXiv:2007 [dataset]
A. Piergiovanni, A. Angelova, A. Toshev, and M. S. Ryoo, "Adversarial Generative Grammars for Human Activity Prediction", European Conference on Computer Vision (ECCV), August 2020. arXiv:2008
M. S. Ryoo, A. Piergiovanni, J. Kangaspunta, and A. Angelova, "AssembleNet++: Assembling Modality Representations via Attention Connections", European Conference on Computer Vision (ECCV), August 2020. arXiv:2008 [code]
X. Gu, W. Luo, M. S. Ryoo, and Y. J. Lee, "Password-conditioned Anonymization and Deanonymization with Face Identity Transformers", European Conference on Computer Vision (ECCV), August 2020. arXiv:1911
X. Wang, X. Xiong, M. Neumann, A. Piergiovanni, M. S. Ryoo, A. Angelova, K. M. Kitani, and W. Hua, "AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification", European Conference on Computer Vision (ECCV), August 2020. arXiv:2007
A. Piergiovanni, A. Angelova, M. S. Ryoo, "Evolving Losses for Unsupervised Video Representation Learning", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. arXiv:2002 4-page-version:arXiv:1906
M. S. Ryoo, A. Piergiovanni, M. Tan, and A. Angelova, "AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures", International Conference on Learning Representations (ICLR), April 2020. arXiv:1905 [code]
A. Piergiovanni and M. S. Ryoo, "Learning Multimodal Representations for Unseen Activities", IEEE Winter Conference on Applications of Computer Vision (WACV), March 2020. arXiv:1806
A. Piergiovanni, A. Angelova, and M. S. Ryoo, "Differentiable Grammars for Videos", the 34th AAAI Conference on Artificial Intelligence (AAAI), February 2020. arXiv:1902
A. Piergiovanni, A. Wu, and M. S. Ryoo, "Learning Real-World Robot Policies by Dreaming", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019. arXiv:1805 [project]
A. Wu, A. Piergiovanni, and M. S. Ryoo, "Model-based Behavioral Cloning with Future Image Similarity Learning", Conference on Robot Learning (CoRL), October 2019. arXiv:1910 [project/code]
M. U. Kim, H. Lee, H. J. Jang, and M. S. Ryoo, "Privacy-Preserving Robot Vision with Anonymized Faces by Extreme Low Resolution", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019.
A. Piergiovanni, A. Angelova, A. Toshev, and M. S. Ryoo, "Evolving Space-Time Neural Architectures for Videos", International Conference on Computer Vision (ICCV), October 2019. arXiv:1811
A. Piergiovanni and M. S. Ryoo, "Temporal Gaussian Mixture Layer for Videos", International Conference on Machine Learning (ICML), June 2019. arXiv:1803 [code]
A. Piergiovanni and M. S. Ryoo, "Representation Flow for Action Recognition", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019. arXiv:1810 [code]
A. Piergiovanni and M. S. Ryoo, "Early Detection of Injuries in MLB Pitchers from Video", CVPR Workshop on Computer Vision in Sports, June 2019. arXiv:1904
Z. Ren, Y. J. Lee, and M. S. Ryoo, "Learning to Anonymize Faces for Privacy Preserving Action Detection", European Conference on Computer Vision (ECCV), September 2018. arXiv [project]
M. Xu, C. Fan, Y. Wang, and M. S. Ryoo, and D. J. Crandall, "Joint Person Segmentation and Identification in Synchronized First- and Third-person Videos", European Conference on Computer Vision (ECCV), September 2018. arXiv
C. Fan, J. Lee, and M. S. Ryoo, "Forecasting Hands and Objects in Future Frames", ECCV Workshop on Anticipating Human Behavior, September 2018. arXiv
A. Piergiovanni and M. S. Ryoo, "Learning Latent Super-Events to Detect Multiple Activities in Videos", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2018. arXiv: [code]
A. Piergiovanni and M. S. Ryoo, "Fine-grained Activity Recognition in Baseball Videos", CVPR Workshop on Computer Vision in Sports, June 2018. arXiv [dataset/code]
M. S. Ryoo, K. Kim, and H. J. Yang, "Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning", the 32nd AAAI Conference on Artificial Intelligence (AAAI), February 2018. arXiv
J. Lee and M. S. Ryoo, "Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017. arXiv video
I. Gori, J. K. Aggarwal, L. Matthies, and M. S. Ryoo, "Multi-Type Activity Recognition from a Robot's Viewpoint", the 26th International Joint Conference on Artificial Intelligence (IJCAI), August 2017 (invited).
C. Fan, J. Lee, M. Xu, K. K. Singh, Y. J. Lee, D. J. Crandall, and M. S. Ryoo, "Identifying First-person Camera Wearers in Third-person Videos", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. arXiv
T. Shu, X. Gao, M. S. Ryoo, and S.-C. Zhu, "Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions", IEEE International Conference on Robotics and Automation (ICRA), May 2017. arXiv
A. Piergiovanni+, C. Fan+, and M. S. Ryoo, "Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters", the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017 (+indicates equal contribution). arXiv [code]
M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang, "Privacy-Preserving Human Activity Recognition from Extreme Low Resolution", the 31st AAAI Conference on Artificial Intelligence (AAAI), February 2017. arXiv
T. Shu, M. S. Ryoo, and S.-C. Zhu, "Learning Social Affordance for Human-Robot Interaction", the 25th International Joint Conference on Artificial Intelligence (IJCAI), July 2016. arXiv
M. S. Ryoo, B. Rothrock, and L. Matthies, "Pooled Motion Features for First-Person Videos", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015. arXiv
M. S. Ryoo, T. J. Fuchs, L. Xia, J. K. Aggarwal, and L. Matthies, "Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?", ACM/IEEE International Conference on Human-Robot Interaction (HRI), March 2015 (full paper). pdf dataset
[Best Paper Award Nominee]
L. Xia, I. Gori, J. K. Aggarwal, and M. S. Ryoo, "Robot-Centric Activity Recognition from First-Person RGB-D Videos", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2015. pdf
Y. Iwashita, A. Takamine, R. Kurazume, and M. S. Ryoo, "First-Person Animal Activity Recognition from Egocentric Videos", International Conference on Pattern Recognition (ICPR), August 2014. pdf dataset
S. Mann, K. Kitani, Y. J. Lee, M. S. Ryoo, and A. Fathi, "An Introduction to the 3rd Workshop on Egocentric (First-Person) Vision", IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 2014.
Y. Iwashita+, M. S. Ryoo+, T. J. Fuchs, and C. Padgett, "Recognizing Humans in Motion: Trajectory-based Aerial Video Analysis", the 24th British Machine Vision Conference (BMVC), September 2013 (+indicates equal contribution). pdf video
M. S. Ryoo and L. Matthies, "First-Person Activity Recognition: What Are They Doing to Me?", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2013. pdf video dataset
J. H. Joung, M. S. Ryoo, S. Choi, and S. R. Kim, "Reliable Object Detection and Segmentation Using Inpainting", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Algarve, Portugal, October 2012. pdf
M. S. Ryoo, "Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos", International Conference on Computer Vision (ICCV), Barcelona, Spain, November 2011. pdf results
M. S. Ryoo, "Interactive Learning of Human Activities Using Active Video Composition", International Workshop on Stochastic Image Grammars (SIG), in Proceedings of International Conference on Computer Vision Workshops (ICCVW), Barcelona, Spain, November 2011.
J. H. Joung, M. S. Ryoo, S. Choi, W. Yu, and H. Chae, "Background-aware Pedestrian/Vehicle Detection System for Driving Environments", IEEE Conference on Intelligent Transportation Systems (ITSC), Washington, D.C., October 2011. pdf
M. S. Ryoo and W. Yu, "One Video is Sufficient? Human Activity Recognition Using Active Video Composition", IEEE Workshop on Applications of Computer Vision (WACV), January 2011. pdf
M. S. Ryoo, J. Lee, J. Joung, S. Choi, and W. Yu, "Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events", IEEE Workshop on Applications of Computer Vision (WACV), January 2011. pdf video
M. S. Ryoo, J. Joung, S. Choi, and W. Yu, "Incremental Learning of Novel Activity Categories from Videos", 16th International Conference on Virtual Systems and Multimedia (VSMM), Seoul, Korea, October 2010 (invited). pdf
M. S. Ryoo, C.-C. Chen, J. K. Aggarwal, and A. Roy-Chowdhury, "An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010", International Conference on Pattern Recognition (ICPR) Contests, August 2010. pdf slides website
M. S. Ryoo+, J. T. Lee+, and J. K Aggarwal, "Video Scene Analysis of Interactions between Humans and Vehicles Using Event Context", ACM International Conference on Image and Video Retrieval (CIVR), Xian, China, July 2010 (invited) (+indicates equal contribution). pdf
J. T. Lee, M. S. Ryoo, and J. K Aggarwal, "View Independent Recognition of Human-vehicle Interactions using 3-D Models", IEEE Workshop on Motion and Video Computing (WMVC), Snowbird, Utah, December 2009. pdf
M. S. Ryoo and J. K Aggarwal, "Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities", International Conference on Computer Vision (ICCV), Kyoto, Japan, October 2009. pdf
M. S. Ryoo and J. K Aggarwal, "Stochastic Representation and Recognition of High-level Group Activities: Describing Structural Uncertainties in Human Activities", 1st International Workshop on Stochastic Image Grammars (SIG), in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Miami, FL, June 2009 (invited). extended abstract.
M. S. Ryoo and J. K Aggarwal, "Human Activities: Handling Uncertainties Using Fuzzy Time Intervals", Proceedings of 19th International Conference on Pattern Recognition (ICPR), Tampa, FL, December 2008. pdf
M. S. Ryoo and J. K. Aggarwal, "Observe-and-Explain: A New Approach for Multiple Hypotheses Tracking of Humans and Objects", IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AK, June 2008. pdf i-Lids_example_video CAVIAR_example_video
M. S. Ryoo and J. K. Aggarwal, "Recognition of High-level Group Activities Based on Activities of Individual Members", Proceedings of IEEE Workshop on Motion and Video Computing (WMVC), Copper Mountain, CO, January 2008. pdf IEEE_link example_video1 example_video2
J. T. Lee, M. S. Ryoo, M. Riley, and J. K. Aggarwal, "Real-time Detection of Illegally Parked Vehicles using 1-D Transformation", Proceedings of IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), London, UK, September 2007. pdf or IEEE_link
M. Bhargava, C.-C. Chen, M. S. Ryoo, and J. K. Aggarwal, "Detection of Abandoned Objects in Crowded Environments", Proceedings of IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS), London, UK, September 2007. pdf or IEEE_link
M. S. Ryoo and J. K. Aggarwal, "Hierarchical Recognition of Human Activities Interacting with Objects", 2nd International Workshop on Semantic Learning Applications in Multimedia (SLAM), in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, June 2007. pdf or IEEE_link
M. S. Ryoo and J. K. Aggarwal, "Robust Human-Computer Interaction System Guiding a User by Providing Feedback", Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, January 2007. pdf
M. S. Ryoo and J. K. Aggarwal, "Semantic Understanding of Continued and Recursive Human Activities", Proceedings of 18th International Conference on Pattern Recognition (ICPR), Vol. 1, pp. 379~382, Hong Kong, August 2006. pdf or IEEE_link
M. S. Ryoo and J. K. Aggarwal, "Recognition of Composite Human Activities through Context-Free Grammar based Representation", Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp. 1709-1719, New York, NY, June 2006. pdf or IEEE_link
H. S. Yang, Y. Seo, M. S. Ryoo, and H. Jung, "Affective Communication System with Emotional Memories for Multimodal Interaction with Humanoids", Proceedings of the 11th international conference on virtual systems and multimedia (VSMM), October 2005.
M. S. Ryoo, Y. Seo, H. Jung, and H. S. Yang, "Affective Dialogue Communication System with Emotional Memories for Humanoid Robots", Proceedings of the First International Conference on Affective Computing and Intelligent Interaction (ACII), LNCS 3784, pp. 819-827, October 2005. pdf
D. Pardoe, M. Ryoo, and R. Miikkulainen, "Evolving Neural Network Ensembles for Control Problems", Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), June 2005. link
H. Jung, Y. Seo, M. S. Ryoo, and H. S. Yang, "Affective Communication System with Multimodality for Humanoid Robot AMI", IEEE-RAS/RSJ International Conference on Humanoid Robots (Humanoids), November 2004. link

Journal publications

J. Shang, X. Li, K. Kahatapitiya, Y. Lee, M. S. Ryoo, "StARformer: Transformer with State-Action-Reward Representations for Robot Learning", IEEE Transactions on Pattern Analysis and Machine Intelligence (T PAMI), Early Access, 2022. ieeexpore
A. Piergiovanni, A. Angelova, M. S. Ryoo, "Tiny Video Networks", Applied AI Letters, October 2021. arXiv:1910
R. Hadidi, J. Cao, M. Woodward, M. S. Ryoo, and H. Kim, "Distributed Perception by Collaborative Robots", IEEE Robotics and Automation Letters (RA-L), 2018. [IROS 2018 presentation]
M. S. Ryoo and L. Matthies, "First-Person Activity Recognition: Feature, Temporal Structure, and Prediction", International Journal of Computer Vision (IJCV), 119(3):307??28, 2016. link
I. Gori, J. K. Aggarwal, L. Matthies, and M. S. Ryoo, "Multi-Type Activity Recognition in Robot-Centric Scenarios", IEEE Robotics and Automation Letters (RA-L), 1(1):593-600, February 2016. [ICRA 2016 presentation] arXiv link
[Best Paper Award in Robot Vision from ICRA 2016]
M. S. Ryoo, S. Choi+, J. H. Joung+, J.-Y. Lee+, and W. Yu, "Personal Driving Diary: Automated Recognition of Driving Events from First-Person Videos", Computer Vision and Image Understanding (CVIU), 117(10): 1299-1312, October 2013 (+indicates equal contribution). pdf link
J. K. Aggarwal and M. S. Ryoo, "Toward a Unified Framework of Motion Understanding", Image and Vision Computing, 30(8):465-466, August 2012. link
M. S. Ryoo and J. K. Aggarwal, "Stochastic Representation and Recognition of High-level Group Activities", International Journal of Computer Vision (IJCV), 93(2):183-200, June 2011. pdf link
J. K. Aggarwal and M. S. Ryoo, "Human Activity Analysis: A Review", ACM Computing Surveys (CSUR), 43(3), April 2011. pdf link
M. S. Ryoo, K. Grauman, and J. K. Aggarwal, "A Task-Driven Intelligent Workspace System to Provide Guidance Feedback", Computer Vision and Image Understanding (CVIU), 114(5):520-534, May 2010. link
J. T. Lee, M. S. Ryoo, M. Riley, and J. K. Aggarwal, "Real-Time Illegal Parking Detection in Outdoor Environments Using 1-D Transformation", IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 19(7):1014-1024, July 2009. link
M. Bhargava, C.-C. Chen, M. S. Ryoo, and J. K. Aggarwal, "Detection of Object Abandonment Using Temporal Logic," Machine Vision and Applications (MVA), 20(5):271-281, June 2009. link
M. S. Ryoo and J. K Aggarwal, "Semantic Representation and Recognition of Continued and Recursive Human Activities", International Journal of Computer Vision (IJCV), 82(1):1-24, April 2009. pdf link

Datasets

M. S. Ryoo and J. K. Aggarwal, UT-Interaction Dataset: ICPR Contest on Semantic Description of Human Activities (SDHA), 2010. website
Chia-Chih Chen, M. S. Ryoo, and J. K. Aggarwal, UT-Tower Dataset: Aerial View Activity Classification Challenge, 2010. website

Thesis papers

"Semantic Representation and Recognition of Human Activities", Ph. D. Thesis, Track of Computer Engineering, Department of ECE, The University of Texas at Austin, August 2008.
[Outstanding Dissertation Award Nominee]
"Semantic Understanding of Continued and Recursive Activities using Context-Free Grammar", M. S. Thesis, Track of Computer Engineering, Department of ECE, The University of Texas at Austin, August 2006.
[Outstanding Thesis Award Nominee]
"Affective Dialogue Communication System with Emotional Memories for Humanoid Robots", B. S. Thesis, Division of Computer Science, Department of EECS, KAIST, June 2004.

Home