About Me

Chih-Yao Ma

My name is Chih-Yao Ma. I am a Ph.D. student at Georgia Tech. In recent years, I have been primarily focusing on the research fields at the intersection of computer vision, natural language processing, and temporal reasoning. I have conducted researches on large-scale video classification, fine-grained human action recognition, relational reasoning for video understanding, visually grounded video captioning, and vision-and-language navigation agents.

game Resume (PDF)


[New] I am on the job market for a full-time position on Artificial Intelligence and Machine Learning.

Please drop me an email if you are interested!

Career

Facebook Research

Vision and Language

with Marcus Rohrbach (FAIR), Yannis Kalantidis (AML), and Peter Vajda (Mobile Vision)

Summer 2019
Research Intern

Salesforce Research

Vision-and-Language Navigation

with Caiming Xiong and Richard Socher

Summer 2018
Research Intern

NEC-Labs Machine Learning

Relational reasoning for human action recognition and video captioning

with Asim Kadav

Summer & Fall 2017
Research Intern

Georgia Tech

Electrical and Computer Engineering

2014 Fall - 2019 Fall (expected)
Ph.D. student

National Chiao Tung University

Electrical and Computer Engineering

Aug. 2012 - May 2014
Research Assistant

National Chiao Tung University

Electrical and Computer Engineering

Aug. 2006 - May 2011
B.S./M.S.

Selected Publications

game   Please see my Google Scholar for complete publication list.

game The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation
Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira
Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)
[arXiv] / [GitHub] / [Project] / [ML@GT] / [bibtex]

game AdaFrame: Adaptive Frame Selection for Fast Video Recognition
Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S Davis
Computer Vision and Pattern Recognition (CVPR), 2019
[arXiv] / [bibtex]

game Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
International Conference on Learning Representations (ICLR), 2019
(Top 7% of reviews)
[arXiv] / [OpenReview] / [GitHub] / [Project] / [Poster] / [ML@GT] / [bibtex]

game Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Computer Vision and Pattern Recognition (CVPR), 2018
[arXiv] / [Project] / [Poster] / [ML@GT] / [bibtex]

game game TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition
Chih-Yao Ma*, Min-Hung Chen*, Zsolt Kira, and Ghassan AlRegib
Signal Processing: Image Communication, 2018
(* equal contribution)
[arXiv] / [GitHub] / [Project] / [bibtex]

game Grounded Objects and Interactions for Video Captioning
Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, Hans Peter Graf
Neural Information Processing Systems (NeurIPS) Workshop on Visually-Grounded Interaction and Language, 2017
[arXiv] / [bibtex]

game game Learning-based saliency model with depth information
Chih-Yao Ma and Hsueh-Ming Hang
Journal of vision, 2015
[Paper] / [bibtex]


Research Interest