ACLab Logo

Expanding the Viewpoint of Dynamic Scenes beyond Constrained Camera Motions

Shaotong Zhu, Le Jiang and ACLab Associates

📄 Paper 💻 Code

Abstract

In the domain of dynamic Neural Radiance Fields (NeRF) for novel view synthesis, current state-of-the-art (SOTA) techniques struggle when the camera's pose deviates significantly from the primary viewpoint, resulting in unstable and unrealistic outcomes. This paper introduces Expanded Dynamic NeRF (ExpanDyNeRF), a monocular NeRF method that integrates a Gaussian splatting prior to tackle novel view synthesis with large-angle rotations. ExpanDyNeRF employs a pseudo ground truth technique to optimize density and color features, which enables the generation of realistic scene reconstructions from challenging viewpoints. Additionally, we present the Synthetic Dynamic Multiview (SynDM) dataset, the first GTA V-based dynamic multiview dataset designed specifically for evaluating robust dynamic reconstruction from significantly shifted views. We evaluate our method quantitatively and qualitatively on both the SynDM dataset and the widely recognized NVIDIA dataset, comparing it against other SOTA methods for dynamic scene reconstruction. Our evaluation results demonstrate that our method achieves superior performance.

Challenge and Motivation

Pipeline

Foreground-Background Decomposition
In video sequences, backgrounds are largely static while foregrounds are dynamic. Thus, the model decomposes the scene into two parts:

Rendered outputs from both branches are blended to reconstruct the dynamic scene, supervised by the super-resolution loss ( Lsr ).

Novel View Feature Optimization

Pipeline Diagram

SynDM Dataset

Motivation
Existing dynamic video datasets lack ground-truth for side views, making it impossible to quantitatively evaluate novel view synthesis results at deviated angles. This limitation arises because recording dynamic multi-view videos in real-world settings is extremely difficult or nearly infeasible.

SynDM fills this gap by providing dynamic multi-view videos with side-view ground-truth, enabling systematic evaluation of novel view rendering performance.

Dataset Overview

Qualitative and Quantitative Results

We conduct a comprehensive comparison between our ExpanDyNeRF and four SOTA novel view synthesis methods: RoDynRF (Liu et al., 2023), MonoNeRF (Fu et al., 2022), D3DGS (Yang et al., 2024), and D4NeRF (Zhang et al., 2023a), on SynDM and NVIDIA datasets. Qualitative results are shown in the video below with novel view deviated from -30 degree to 30 degree, and Quantitative results are shown in Table 1 via FID score, PSNR, and LPIPS. Our method achieves the best performance on both datasets.

Table 1: Quantitative comparison results on SynDM dataset

Challenges in ExpanDyNeRF and Improvements via ExpanDyGauss

Limitations of NeRF-based Methods

Advantages of Gaussian Splatting

Expanded Dynamic Gaussian Splatting (ExpanDyGauss)
To address these issues, we propose ExpanDyGauss, a monocular Gaussian splatting framework for large-angle novel view synthesis. ExpanDyGauss leverages a video-to-video diffusion model to perform spatial-temporal inpainting, generating consistent pseudo ground truth across 360°, providing effective supervision for both static and dynamic components without significant overhead.

Overall Pipeline

Dense Initialization and Segmentation
easi3r predicts dense 3D point clouds and camera poses from monocular videos without large camera motion.

SAM segmentation separates frames into foreground and background, forming foreground Gaussians and background Gaussians.

Gaussian Reconstruction and Enhancement

Gaussian Diagram
Real Scene Results

Demo Results

Our method generates dynamic Gaussian Splatting models for novel view synthesis on both synthetic datasets and real-world captured videos. The results demonstrate the effectiveness of our approach in handling different scenarios.

Real-world Data
To demonstrate the effectiveness of our method in the real world application, we applied our method on a casually captured monocular video. Our approach can generate a dynamic Gaussian Splatting model for reasonable novel view synthesis in real-world scenarios.

Synthesis Data
A demo of the results on SynDM dataset. For a monocular input video with dynamic scene, our method can generate a dynamic Gaussian Splatting model and synthesize novel views.

Application

Robotic Perception and Navigation

Human Motion Capture and Sports Analysis

Application Diagram