Reconstructing complete and interactive 3D scenes remains a fundamental challenge in computer vision and robotics, particularly due to persistent object occlusions and limited sensor coverage. Even multi-view observations from a single scene scan often fail to capture the full structural details. Existing approaches typically rely on multi-stage pipelines—such as segmentation, background completion, and inpainting—or require per-object dense scanning, both of which are error-prone, and not easily scalable. We propose IGFuse , a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans, where natural object rearrangement between captures reveal previously occluded regions. Our method constructs segmentation-aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans. To handle spatial misalignments, we introduce a pseudo-intermediate scene state for symmetric alignment, alongside collaborative co-pruning strategies to refine geometry. IGFuse enables high-fidelity rendering and object-level scene manipulation without dense observations or complex pipelines. Extensive experiments validate the framework’s strong generalization to novel scene configurations, demonstrating its effectiveness for real-world 3D reconstruction and real-to-simulation transfer.
Overview of our dual-state Gaussian alignment pipeline. Given two input scans (scan $i$ and scan $j$), the Gaussians in state $i$ are initially constrained by corresponding image observations. After transferring to state $j$ (i.e., $G_i \rightarrow G_{i \rightarrow j}$), the Gaussians are further supervised by state $j$’s image via an alignment loss $\mathcal{L}_{\text{align}}$, and regularized through a co-pruning strategy that enforces 3D consistency by removing mismatched or redundant components. The reverse transfer ($G_j \rightarrow G_{j \rightarrow i}$) is performed symmetrically. Additionally, both states are transferred into a shared pseudo-state space ($G_{i \rightarrow p}$, $G_{j \rightarrow p}$), where a pseudo-state loss $\mathcal{L}_{\text{pseudo}}$ encourages tighter cross-state alignment.
Our method fuses foreground information from multi-scans, thereby enabling more realistic simulations across diverse states.
Our method integrates information from multiple scans to reconstruct a complete background.
Compared with the baseline Gaussian Grouping in the novel state, our method produces cleaner segmentation results. Gaussian Grouping often yields holes or extraneous regions along object boundaries in 2D segmentation. In terms of depth, its feature-based segmentation fails to propagate to all 3D points, resulting in numerous residual points after object movement and leaving depth holes in the regions from which objects are moved.
@misc{hu2025igfuseinteractive3dgaussian,
title={IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion},
author={Wenhao Hu and Zesheng Li and Haonan Zhou and Liu Liu and Xuexiang Wen and Zhizhong Su and Xi Li and Gaoang Wang},
year={2025},
eprint={2508.13153},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.13153},
}