About Me

Chih-Yao Ma

My name is Chih-Yao Ma. I am a Ph.D. student at Georgia Tech. In recent years, I have been primarily focusing on the research fields at the intersection of computer vision, natural language processing, and temporal reasoning. I have conducted researches on large-scale video classification, fine-grained human action recognition, relational reasoning for video understanding, visually grounded image/video captioning, vision-and-language navigation agents, and self-supervised visual representation learning.

game Resume (PDF)


[New] I am on the job market for a full-time position on Artificial Intelligence and Machine Learning.

Please drop me an email if you are interested!

Career

Facebook Research

Self-Supervised Learning

with Marcus Rohrbach (FAIR), Yannis Kalantidis (AML), Kan Chen (Mobile Vision), and Peter Vajda (Mobile Vision)

Summer & Fall 2019
Research Intern

Salesforce Research

Vision-and-Language Navigation

with Caiming Xiong and Richard Socher

Summer 2018
Research Intern

NEC-Labs Machine Learning

Relational reasoning for human action recognition and video captioning

with Asim Kadav

Summer & Fall 2017
Research Intern

Georgia Tech

Electrical and Computer Engineering

with Ghassan AlRegib (advisor) and Zsolt Kira

2014 Fall - 2020 Spring (expected)
Ph.D. student

National Chiao Tung University

Electrical and Computer Engineering

with Hsueh-Ming Hang

Aug. 2012 - May 2014
Research Assistant

National Chiao Tung University

Electrical and Computer Engineering

Aug. 2006 - May 2011
B.S./M.S.

Selected Publications

game   Please see my Google Scholar for complete publication list.

game Learning to Generate Grounded Image Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
Technical Report, 2019
[arXiv] / [GitHub (coming soon)] / [Project] / [bibtex]

game Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification
Chia-Wen Kuo, Chih-Yao Ma, Jia-Bin Huang, Zsolt Kira
Technical Report, 2019
[arXiv] / [GitHub (coming soon)] / [Project] / [bibtex]

game The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
[arXiv] / [GitHub] / [Project] / [Poster] / [ML@GT] / [bibtex]

game AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S Davis
Computer Vision and Pattern Recognition (CVPR), 2019
[arXiv] / [Poster] / [bibtex]

game Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
International Conference on Learning Representations (ICLR), 2019
(Top 7% of reviews)
[arXiv] / [OpenReview] / [GitHub] / [Project] / [Poster] / [ML@GT] / [bibtex]

game Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Computer Vision and Pattern Recognition (CVPR), 2018
[arXiv] / [Project] / [Poster] / [ML@GT] / [bibtex]

game game TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition
Chih-Yao Ma*, Min-Hung Chen*, Zsolt Kira, and Ghassan AlRegib
Signal Processing: Image Communication, 2018
(* equal contribution)
[arXiv] / [GitHub] / [Project] / [bibtex]

game Grounded Objects and Interactions for Video Captioning
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Neural Information Processing Systems (NeurIPS) Workshop on Visually-Grounded Interaction and Language, 2017
[arXiv] / [bibtex]

game game Learning-based saliency model with depth information
Chih-Yao Ma and Hsueh-Ming Hang
Journal of vision, 2015
[Paper] / [bibtex]


Research Interest