Alireza Fathi

I am a senior staff research scientist / manager at Google DeepMind. Before joining Google in 2016, I spent a couple of great years at Apple working on 3d computer vision. Before that I was a Postdoctoral Fellow in FeiFei Li's lab at Stanford back in 2013-2014. I received my Ph.D. degree from Georgia Institute of Technology, and my B.Sc. degree from Sharif University of Technology.

My areas of interest:

LLM Reasoning in Visual Space
Multi-Modal Large Language Models
Neural Rendering
Egocentric Vision
3D Scene Understanding

Contact: alireza.fathi@gmail.com

Serving as an area chair for NeurIPS 2025, ICCV 2025, CVPR 2025, NeurIPS2024, ECCV2024, 3DV 2024, CVPR 2024, CVPR 2023, ECCV2022, CVPR 2022.

Publications (Google Scholar)

Temporal Chaing of Thought: Long Video Understanding by Thinking in Frames

Anurag Arnab, Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

NeurIPS, 2025

FirePlace: Geometric Refinement of LLM Common Sense Reasoning for 3D Object Placement

Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, Alireza Fathi

CVPR, 2025

Visual Lexicon: Rich Image Features in Language Space

XuDong Wang, Xingyi Zhou, Alireza Fathi, Trevor Darrell, Cordelia Schmid

CVPR, 2025

Language-Guided Image Tokenization for Generation

Kaiwen Zha, Lijun Yu, Alireza Fathi, David Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu

CVPR, 2025

SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David Ross, Cordelia Schmid, Alireza Fathi

ICML, 2024

Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach

Mathilde Caron, Alireza Fathi, Cordelia Schmid, Ahmet Iscen

NeurIPS, 2024

A Generative Approach for Wikipedia-Scale Visual Entity Recognition

Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid

CVPR, 2024

Retrieval-Enhanced Contrastive Vision-Text Models

Ahmet Iscen, Mathilde Caron, Alireza Fathi, Cordelia Schmid

ICLR, 2024

AVIS: Autonomous Visual Information Seeking with Large Language Models

Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David Ross, Cordelia Schmid, Alireza Fathi

NeurIPS, 2023 (Google blog post)

Learning Object-centric Neural Scattering Functions for Free-viewpoint Relighting and Scene Composition

Hong-Xing Yu*, Michelle Guo*, Alireza Fathi, Yen-Yu Chang, Eric Ryan Chan, Ruohan Gao, Thomas Funkhouser, Jiajun Wu

TMLR, 2023

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David Ross, Alireza Fathi

CVPR, 2023 (Google blog post)

Improving Image Recognition by Retrieving from Web-Scale Image-Text Data

Ahmet Iscen, Alireza Fathi, Cordelia Schmid

CVPR, 2023

A Memory Transformer Network for Incremental Learning

Ahmet Iscen, Tom Bird, Mathilde Caron, Alireza Fathi, Cordelia Schmid

BMVC, 2022

im2nerf: Image to Neural Radiance Field in the Wild

Lu Mi, Abhijit Kundu, David Ross, Frank Dellaert, Noah Snavely, Alireza Fathi

arXiv:2209.04061, 2022

Pre-Tram: Self-supervised Pre-training via Connecting Trajectory and Map

Chenfeng Xu, Tian Li, Chen Tang, Lingfeng Sun, Kurt Keutzer ,Masayoshi Tomizuka, Alireza Fathi, Wei Zhan

ECCV, 2022

Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation

Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, Thomas Funkhouser

CVPR, 2022

Object-Centric Neural Scene Rendering

Michelle Guo, Alireza Fathi, Jiajun Wu, Thomas Funkhouser

arXiv:2012.08503

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds

Rui Huang, Wanyue Zhang, Thomas Funkhouser, Abhijit Kundu, Caroline Pantofaru, David A Ross, Alireza Fathi

ECCV, 2020 PDF

Virtual Multi-view Fusion for 3D Semantic Segmentation

Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David A Ross, Brian E Brewington, Thomas Funkhouser, Caroline Pantofaru

ECCV, 2020 PDF

Pillar-based Object Detection for Autonomous Driving

Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Tom Funkhouser, Justin Solomon

ECCV, 2020 PDF

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes

Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry S. Davis, Alireza Fathi

CVPR, 2020 PDF (Google blog post)

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation

Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Nießner

CVPR, 2020 PDF

Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction

Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa

arXiv:1906.06792, 2019 PDF

Tracking emerges by colorizing video

Carl Vondrick, Abhinav Shrivistava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

ECCV 2018 PDF

Instance embedding transfer to unsupervised video object segmentation

Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C.-C. Jay Kuo

CVPR 2018 PDF

The devil is in the decoder

Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

BMVC 2017 PDF

Semantic instance segmentation via deep metric learning

Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin Murphy

arXiv:1703.10277, 2017 PDF

Speed/accuracy trade-offs for modern convolutional object detectors

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

CVPR 2017 Winner of The COCO Object Detection Challenge in 2016 PDF