Weirong Chen

News

05/2026DynaTok (Google SR project) has been accepted to ICML 2026!
01/2026NOVA3R has been accepted to ICLR 2026!
10/2025Back on Track has been selected as one of the best paper award candidates at ICCV 2025!
09/2025CoProU-VO received best paper award at GCPR 2025. Congrats to Jingchao and Oussema!
07/2025I started as a student researcher at Google XR in Zurich, working on dynamic scene modeling!
06/2025Back on Track has been accepted to ICCV 2025 (Oral)!

02/2025AnyCam has been accepted to CVPR 2025!
02/2024LEAP-VO was accepted to CVPR 2024!
02/2024NeRF-SCR was accepted to ICRA 2024!
10/2023I joined Technical University of Munich as an ELLIS PhD student!

Research

I build 3D-native systems that perceive, reconstruct, and generate dynamic worlds — the spatial foundation for world models and agents to simulate, predict, and act across space and time.

3D Reconstruction & Generation

Reconstructing and generating complete, physically plausible 3D structure from unposed views.

Dynamic World Modeling

Learning 4D representations and scene motion to capture how dynamic worlds evolve over time.

Spatial Perception for Agents

Localization, tracking, and motion estimation that ground agents and world models in space and time.

DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds

Weirong Chen, Keisuke Tateno, Hidenobu Matsuki, Michael Niemeyer, Daniel Cremers, Federico Tombari

International Conference on Machine Learning (ICML), 2026

Preprint / Project Page (under construction)

A token-based framework that fuses partial point-cloud sequences over time into complete, temporally coherent 4D geometry.

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Weirong Chen, Chuanxia Zheng, Ganlin Zhang, Andrea Vedaldi, Daniel Cremers

International Conference on Learning Representations (ICLR), 2026

arXiv / Paper / Project Page / Code

A feed-forward transformer that builds a global latent representation to generate complete, amodal 3D structure from a set of unposed images.

Back on Track dynamic scene reconstruction result

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

Weirong Chen, Ganlin Zhang, Felix Wimbauer, Rui Wang, Nikita Araslanov, Andrea Vedaldi, Daniel Cremers

International Conference on Computer Vision (ICCV), 2025 (Oral, Best Paper Candidate)

arXiv / Paper / Project Page / Code

A method for consistent dynamic scene reconstruction via motion decoupling, bundle adjustment, and global refinement.

CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry

Jingchao Xie*, Oussema Dhaouadi*, Weirong Chen, Johannes Meier, Jacques Kaiser, Daniel Cremers

German Conference on Pattern Recognition (GCPR), 2025 (Oral, Best Paper Award)

arXiv / Project Page / Code

An unsupervised visual odometry method that improves pose estimation in dynamic scenes.

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers

Computer Vision and Pattern Recognition Conference (CVPR), 2025

arXiv / Paper / Project Page / Code

A method for learning camera poses and intrinsics from dynamic casual videos.

DynSUP dynamic Gaussian splatting result

DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

Weihang Li*, Weirong Chen*, Shenhan Qian, Benjamin Busam, Daniel Cremers, Haoang Li

IEEE Transactions on Image Processing (TIP), 2026

arXiv / Project Page

Dynamic radiance field reconstruction from only two images, enabled by object-level bundle adjustment.

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Weirong Chen, Le Chen, Rui Wang, Marc Pollefeys

Computer Vision and Pattern Recognition Conference (CVPR), 2024

arXiv / Project Page / Code / Video

A robust visual odometry system leveraging temporal context with long-term point tracking to tackle occlusions and dynamic environments.

Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Le Chen, Weirong Chen, Rui Wang, Marc Pollefeys

International Conference on Robotics and Automation (ICRA), 2024

arXiv / Video

A visual localization pipeline using rendered data from NeRF, uncertainty-guided novel view selection, and evidential scene coordinate regression.

Dense two-view structure from motion result

Uncertainty-Driven Dense Two-View Structure from Motion

Weirong Chen, Suryansh Kumar, Fisher Yu

International Conference on Intelligent Robots and Systems (IROS), 2023
IEEE Robotics and Automation Letters (RA-L), 2023

arXiv / Project Page / Video

An accurate and reliable pipeline for dense two-view SfM using weighted bundle adjustment with robust outlier filtering and learning-based confidence modeling.

Webly supervised image classification metadata result

Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph

Jingkang Yang*, Weirong Chen*, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang

ACM International Conference on Multimedia (ACM MM), 2020 (Oral)

arXiv / Slides

Webly supervised learning for semantic label confusion using visual-semantic graph with metadata-aware anchor selection and GNN-based label propagation.

Webly Supervised Image Classification with Self-Contained Confidence

Jingkang Yang, Litong Feng, Weirong Chen, Xiaopeng Yan, Huabin Zheng, Ping Luo, Wayne Zhang

European Conference on Computer Vision (ECCV), 2020

arXiv / Code

Webly supervised learning for noisy label classification via sample-wise web label correction with model confidence and pseudo machine label.

Other Projects

An Efficient and Accurate Offline Python SLAM using COLMAP

with Yifei Liu, Kexin Shi, Yidan Gao

Supervised by Paul-Edouard Sarlin and Marc Pollefeys

Demo (KITTI) / Demo (Zurich) / Report

A robust and highly-extensible Python SLAM built on pycolmap; achieved better pose accuracy and significant speed improvement compared to COLMAP.

Real-time neural rendering in VR pipeline

Real-time Photorealistic Neural Rendering in VR

with Shengqu Cai, Mingyang Song, Tianfu Wang

Supervised by Sergey Prokudin

Demo / Report / Code

A general neural rendering pipeline for photorealistic synthesis in VR devices in real-time; demo included human neural rendering and scene style transfer.

Experiences

Talks

June 2, 2026TUMVisionSlides NOVA3R: From Pixel-Aligned Reconstruction to Non-Pixel-Aligned World

Academic Services

Conference Reviewer: CVPR, ECCV, ICCV, ICRA, IROS, ICLR, ICML, NeurIPS
Journal Reviewer: RA-L

Teaching

Summer 2026Practical Course: Deep Learning for Spatial AITU Munich
Winter 2025Seminar Course: 3D Vision Foundation ModelsTU Munich
Summer 2025Practical Course: Deep Learning for Spatial AITU Munich
Winter 2024Seminar Course: Visual-based 3D/4D ReconstructionTU Munich
Summer 2024Practical Course: Geometric Scene UnderstandingTU Munich

About

News

Research

3D Reconstruction & Generation

Dynamic World Modeling

Spatial Perception for Agents

DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction

CoProU-VO: Combining Projected Uncertainty for End-to-End Unsupervised Monocular Visual Odometry

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos

DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair

LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry

Leveraging Neural Radiance Fields for Uncertainty-Aware Visual Localization

Uncertainty-Driven Dense Two-View Structure from Motion

Webly Supervised Image Classification with Metadata: Automatic Noisy Label Correction via Visual-Semantic Graph

Webly Supervised Image Classification with Self-Contained Confidence

Other Projects

An Efficient and Accurate Offline Python SLAM using COLMAP

Real-time Photorealistic Neural Rendering in VR

Experiences

Google Research

Microsoft Spatial AI Lab Zurich

ETH Computer Vision and Geometry Group

SenseTime Research

Talks

Academic Services

Teaching