About Me

Kevin Chih-Yao Ma

My name is Kevin Chih-Yao Ma (馬志堯). I am a Staff Research Scientist working on generative foundation models in Meta's Gen AI org. In the past, I have been primarily focusing data-efficient learning, including semi-supervised learning, self-supervised learning, federated learning, etc. During my PhD, I have conducted researches on large-scale video classification, fine-grained human action recognition, relational reasoning for video understanding, visually grounded image/video captioning, and vision-and-language navigation agents.

Career

Meta

Lead IC in Meta's Movie Gen.
Built llama-like DiT, scale model with parallelization, and co-lead post-training.

Core contributor of Emu that powers Emu Video/Edit and Imagine.

with Peter Vajda and Zijian He (GenAI Media Foundation team)

Aug. 2023 - Present
Staff Research Scientist

Meta

Generative Models, Federated Learning, Semi-Supervised Learning

with Peter Vajda and Zijian He (Mobile Vision & GenAI Media Foundation team)

Jan. 2022 - July. 2023
Senior Research Scientist

Meta

Data-efficient learning

with Peter Vajda and Zijian He (Mobile Vision team)

Aug. 2020 - Dec. 2021
Research Scientist

Meta

Self-Supervised Learning

with Marcus Rohrbach (FAIR), Yannis Kalantidis (AML), Kan Chen (Mobile Vision), and Peter Vajda (Mobile Vision)

Summer & Fall 2019
Research Intern

Salesforce Research

Vision-and-Language Navigation

with Caiming Xiong and Richard Socher

Summer 2018
Research Intern

NEC-Labs Machine Learning

Relational reasoning for human action recognition and video captioning

with Asim Kadav

Summer & Fall 2017
Research Intern

Georgia Tech

Electrical and Computer Engineering

with Ghassan AlRegib (advisor) and Zsolt Kira

2014 Fall - 2020 Spring
Ph.D. student

National Chiao Tung University

Electrical and Computer Engineering

with Hsueh-Ming Hang

Aug. 2012 - May 2014
Research Assistant

National Chiao Tung University

Electrical and Computer Engineering

Aug. 2006 - May 2011
B.S./M.S.

Selected Publications

game   Please see my Google Scholar for complete publication list.

Movie Gen: A Cast of Media Foundation Models
Meta's Movie Gen team
[Webpage] / [arXiv] / [MovieGenBench (GitHub)] / [bibtex]

game Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
Xiaoliang Dai*, Ji Hou*, Chih-Yao Ma*, Sam Tsai*, Jialiang Wang*, Rui Wang*, Peizhao Zhang*, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly+, Vignesh Ramanathan+, Zijian He+, Peter Vajda+, Devi Parikh+
(*: Core contributors: equal contribution, alphabetical order.)
(+: Equal last authors.)
[arXiv] / [bibtex]

game RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data
Sangwoo Mo, Jong-Chyi Su, Chih-Yao Ma, Mido Assran, Ishan Misra, Licheng Yu, Sean Bell
International Conference on Learning Representations (ICLR), 2023
[arXiv] / [GitHub] / [bibtex]

game Trainable Projected Gradient Method for Robust Fine-tuning
Junjiao Tian, Zecheng He, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Zsolt Kira
Computer Vision and Pattern Recognition (CVPR), 2023
[arXiv] / [GitHub] / [bibtex]

game Structure-Encoding Auxiliary Tasks for Improved Visual Representation in Vision-and-Language Navigation
Chia-Wen Kuo, Chih-Yao Ma, Judy Hoffman, Zsolt Kira
Winter Conference on Applications of Computer Vision (WACV), 2022
[arXiv] / [Project] / [bibtex]

game Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks
Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira
Conference on Neural Information Processing Systems (NeurIPS), 2022
[arXiv] / [Project] / [GitHub] (coming soon) [bibtex]

game Open-Set Semi-Supervised Object Detection
Yen-Cheng Liu, Chih-Yao Ma, Xiaoliang Dai, Junjiao Tian, Peter Vadja, Zijian He, Zsolt Kira
European Conference on Computer Vision (ECCV), 2022 (Oral)
[arXiv] / [Project] / [GitHub] (coming soon) [bibtex]

game Cross-Domain Adaptive Teacher for Object Detection
Yu-Jhe Li, Xiaoliang Dai, Chih-Yao Ma, Yen-Cheng Liu, Kan Chen, Bichen Wu, Zijian He, Kris Kitani, Peter Vadja
Computer Vision and Pattern Recognition (CVPR), 2022
[PDF] / [GitHub] / [Project] / [bibtex]

game Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based Detectors
Yen-Cheng Liu, Chih-Yao Ma, Zsolt Kira
Computer Vision and Pattern Recognition (CVPR), 2022
[arXiv] / [PDF] / [GitHub] / [Project] / [bibtex]

game Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation
Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira
IEEE International Conference on Robotics and Automation (ICRA), 2021
[arXiv] / [GitHub] / [Project] / [bibtex]

game Unbiased Teacher for Semi-Supervised Object Detection
Yen-Cheng Liu, Chih-Yao Ma, Zijian He, Chia-Wen Kuo, Kan Chen, Peizhao Zhang, Bichen Wu, Zsolt Kira, Peter Vajda
International Conference on Learning Representations (ICLR), 2021
[arXiv] / [GitHub] / [Project] / [OpenReview] / [bibtex]

game Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020
[arXiv] / [GitHub] / [Project] / [ML@GT] / [bibtex]

game FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning
Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020
[arXiv] / [Project] / [GitHub] / [bibtex]

game Who2com: Collaborative Perception Via Learnable Handshake Communication
Yen-Cheng Liu, Junjiao Tian, Chih-Yao Ma, Nathaniel Glaser, Chia-Wen Kuo, Zsolt Kira
International Conference on Robotics and Automation (ICRA), 2020
[arXiv] [GitHub] / [Project] / [bibtex]

game Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification
Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
Technical Report, 2019
[arXiv] / [Project] / [bibtex]

game The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
[arXiv] / [GitHub] / [Project] / [Poster] / [bibtex]

game AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S Davis
Computer Vision and Pattern Recognition (CVPR), 2019
[arXiv] / [Poster] / [bibtex]

game Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
International Conference on Learning Representations (ICLR), 2019
(Top 7% of reviews)
[arXiv] / [OpenReview] / [GitHub] / [Project] / [Poster] / [ML@GT] / [bibtex]

game Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Computer Vision and Pattern Recognition (CVPR), 2018
[arXiv] / [Project] / [Poster] / [ML@GT] / [bibtex]

game game TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition
Chih-Yao Ma*, Min-Hung Chen*, Zsolt Kira, and Ghassan AlRegib
Signal Processing: Image Communication, 2018
(*: equal contribution)
[arXiv] / [GitHub] / [Project] / [bibtex]

game Grounded Objects and Interactions for Video Captioning
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Neural Information Processing Systems (NeurIPS) Workshop on Visually-Grounded Interaction and Language, 2017
[arXiv] / [bibtex]

game game Learning-based saliency model with depth information
Chih-Yao Ma and Hsueh-Ming Hang
Journal of vision, 2015
[Paper] / [bibtex]


Research Interest

Services

  • Reviewer for NeurIPS, ICLR, ICML
  • Reviewer for CVPR, ICCV, ECCV
  • Reviewer for NAACL
  • Reviewer for T-PAMI, T-IP, T-TCSVT