Alireza Fathi

I am a research scientist / TLM at Google DeepMind. Before joining Google, I spent a couple of great years at Apple working on 3d computer vision. Before that I was a Postdoctoral Fellow in FeiFei Li's lab at Stanford. I received my Ph.D. degree from Georgia Institute of Technology, and my B.Sc. degree from Sharif University of Technology.


My areas of interest:


Contact: alireza.fathi@gmail.com

AVIS: we introduce a Large Language Model Agent that achieves state-of-the-art results on visual information seeking tasks.

Google blog post

REVEAL: we introduce a visual-language model that learns to utilize a multi-source multi-modal “memory” to answer knowledge-intensive queries.

Google blog post

Publications (Google Scholar)

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David Ross, Cordelia Schmid, Alireza Fathi

ICML, 2024

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Mathilde Caron, Alireza Fathi, Cordelia Schmid, Ahmet Iscen

NeurIPS, 2024

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

CVPR, 2024

Retrieval-Enhanced Contrastive Vision-Text Models

Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

ICLR, 2024

AVIS: Autonomous Visual Information Seeking with Large Language Models

Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David Ross, Cordelia Schmid, Alireza Fathi

NeurIPS, 2023

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David Ross, Alireza Fathi

CVPR, 2023

A Memory Transformer Network for Incremental Learning

Ahmet Iscen, Tom Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid

BMVC, 2022

im2nerf: Image to Neural Radiance Field in the Wild

Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, Alireza Fathi

arXiv:2209.04061, 2022

Pre-Tram: Self-supervised Pre-training via Connecting Trajectory and Map

Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer ,Masayoshi Tomizuka, Alireza Fathi, Wei Zhan

ECCV, 2022

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

CVPR, 2022

Object-Centric Neural Scene Rendering

Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

arXiv:2012.08503

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

Rui Huang, Wanyue Zhang, Thomas Funkhouser, Abhijit Kundu, Caroline Pantofaru, David A Ross, Alireza Fathi

ECCV, 2020 PDF

Virtual Multi-view Fusion for 3D Semantic Segmentation

Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David A Ross, Brian E Brewington, Thomas Funkhouser, Caroline Pantofaru

ECCV, 2020 PDF

Pillar-based Object Detection for Autonomous Driving

Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Tom Funkhouser, Justin Solomon

ECCV, 2020 PDF

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

CVPR, 2020 PDF

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation

Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner

CVPR, 2020 PDF

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

arXiv:1906.06792, 2019 PDF

Tracking emerges by colorizing video

Carl Vondrick, Abhinav Shrivistava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

ECCV 2018 PDF

Instance embedding transfer to unsupervised video object segmentation

Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C.-C. Jay Kuo

CVPR 2018 PDF

The devil is in the decoder

Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

BMVC 2017 PDF

Semantic instance segmentation via deep metric learning

Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin Murphy

arXiv:1703.10277, 2017 PDF

Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

CVPR 2017 Winner of The COCO Object Detection Challenge in 2016 PDF

VideoSET: Video Summary Evaluation through Text

Serena Yeung, Alireza Fathi, Li Fei-Fei

arXiv:1406.5824 [cs.CV] PDF Project Page

Learning to Predict Gaze in Egocentric Video

Yin Li, Alireza Fathi, James M. Rehg

ICCV 2013 PDF

Learning Descriptive Models of Objects and Activities from Egocentric Video

Alireza Fathi

Ph.D. Thesis, Georgia Institute of Technology PDF

Modeling Actions through State Changes

Alireza Fathi, James M. Rehg

CVPR 2013 PDF

Learning to Recognize Daily Actions using Gaze

Alireza Fathi, Yin Li, James M. Rehg

ECCV 2012 PDF, Project Page

Detecting Eye Contact using Wearable Eye-Tracking Glasses

Zhefan Ye, Yin Li, Alireza Fathi, Yi Han, Agata Rozga, Gergory D. Abowd, James M. Rehg

2nd Workshop on Pervasive Eye Tracking and Mobile Eye-based Interaction (in conjunction with UbiComp), 2012PDF

Social Interactions: A First-Person Perspective

Alireza Fathi, Jessica K. Hodgins, James M. Rehg

CVPR 2012 PDF, Dataset

Understanding Egocentric Activities

Alireza Fathi, Ali Farhadi, James M. Rehg

ICCV 2011 PDF, Dataset

Combining Self Training and Active Learning for Video Segmentation

Alireza Fathi, Maria Florina Balcan, Xiaofeng Ren, James M. Rehg

BMVC 2011 PDF, Abstract, Software

Learning to Recognize Objects in Egocentric Activities

Alireza Fathi, Xiaofeng Ren, James M. Rehg

CVPR 2011 PDF, Dataset

Detecting Road Intersections from GPS Traces

Alireza Fathi, John Krumm

GIScience 2010 PDF

Human Pose Estimation using Motion Exemplars

Alireza Fathi, Greg Mori

ICCV 2007 PDF

Voice Synthesis using the Generalized Pressure-Controlled Valve

Tamara Smyth, Alireza Fathi

International Computer Music Conference (ICMC), 2008 PDF

A Standard Workflow for Illumination-Invariant Image Extraction

Mark S. Drew, Muntaseer Salahuddin, Alireza Fathi

15th Color and Imaging Conference, 2007 PDF