Xianzheng Ma (马宪政)

Greeting! I am currently a DPhil Student (10.2023- ) at Department of Engineering Science, University of Oxford, supervised by Prof. Victor Prisacariu and Prof. Iro Laina. I am a part of both the Visual Geometry Group as well as Active Vision Group. Previously, I obtained my Bachelor's and Master's degrees from Wuhan University in 2018 and 2021.

My research interests include LLMs, 3D computer vision and robotics, especially using LLMs's world knowledge to enpower the 3D world understanding and interaction.

For any suggestions or collaborations, please reach out to me at xianzheng@robots.ox.ac.uk.

News & Updates

  • [2024-05-16]  📢 Check out our survey papers for 3D-related tasks empowered by LLMs and other foundataion models.
  • [2023-12-16]  We curated a Awesome-LLM-3D paper list for 3D-related tasks empowered by LLMs.
  • [2023-12-15]  Two papers are accepted in AAAI2024.
  • [2023-10-02]  Start a new journey as DPhil (PhD) at Visual Geometry Group and Active Vision Group in University of Oxford!

Selected Researches

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

Xianzheng Ma Yash Bhalgat Brandon Smart Shuai Chen Xinghui Li Jian Ding Jindong Gu Dave Zhenyu Chen Songyou Peng Jia-Wang Bian Philip H Torr Marc Pollefeys Matthias Nießner Ian D Reid Angel X. Chang Iro Laina Victor Adrian Prisacariu

TPAMI under review

We survey the papers of 3D understanding, generation, and embodied agent tasks empowered by LLMs and other foundataion models (CLIP, SAM)

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following

Ziyu Guo Renrui Zhang Xiangyang Zhu Yiwen Tang Xianzheng Ma Jiaming Han Kexin Chen Peng Gao Xianzhi Li Hongsheng Li Pheng-Ann Heng

ICLR under review

We propose a 3D multi-modal model for general 3D learning, Point-Bind, and the first 3D large language model, Point-LLM

Matching Is Not Enough: A Two-Stage Framework for Category-Agnostic Pose Estimation

Min Shi Zihao Huang Xianzheng Ma Xiaowei Hu Zhiguo Cao

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023 Highlight

We propose a two-stage framework--CapeFormer for category-agnostic pose estimation.

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation

Yulu Gan Xianzheng Ma Yan Bai Yihang Lou Renrui Zhang Nian Shi Lin Luo

AAAI Conference on Artificial Intelligence (AAAI) 2023 Outstanding Student Paper

We offer an alternative and new solution for continual test-time adapation by learning visual domain prompt.

Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding

Xianzheng Ma Zhixiang Wang Yacheng Zhan Yinqiang Zheng Zheng Wang Dengxin Dai Chia-Wen Lin

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 Oral

We alleviate the domain gap caused by mixed fog influence and style variation without labels.

REMOTE: Reinforced Motion Transformation Network for Semi-supervised 2D Pose Estimation in Videos

Xianzheng Ma Hossein Rahmani Zhipeng Fan Bin Yang Jun Chen Jun Liu

AAAI Conference on Artificial Intelligence (AAAI) 2022

We offer an reinforcement learning based method to ultilize the temporal information in videos to train a robust pose estimator.

Rainy WCity: A Real Rainfall Dataset with Diverse Conditions for Semantic Driving Scene Understanding

Xian Zhong* Xianzheng Ma* Shidong Tu Kui Jiang Wenxin Huang Zheng Wang (* means equal contribution)

International Joint Conference on Artificial Intelligence (IJCAI) 2022

We propose a real-world rainy driving dataset for semantic segmentation and devise an unsupervised joint optimization framework based on contrastive learning.

All publications

Academic Services

  • Conference Reviewer: CVPR'2021-2023, AAAI'2022-2025, ICLR'2023, NeurIPS'2024.
  • Journal Reviewer: IJCV, TMM,
  • Area Chair: IJCAI'2022 (China branch)

Misc.

  • Sports: Swimming, Badminton, Table Tennis, etc.