1Beijing Institute of Technology †Corresponding Author
Neural Radiance Fields (NeRF) employ multi-view images for 3D scene representation and have shown remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods often assume a global unique camera and seldom consider scenarios with multiple cameras. Besides, some pose-robust methods still remain susceptible to suboptimal solutions when poses are poor initialized. In this paper, we propose MC-NeRF, a method can jointly optimize both intrinsic and extrinsic parameters for bundle-adjusting Neural Radiance Fields. Firstly, we conduct a theoretical analysis to tackle the degenerate case and coupling issue that arise from the joint optimization between intrinsic and extrinsic parameters. Secondly, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Lastly, we present a global end-to-end network with training sequence that enables the regression of intrinsic and extrinsic parameters, along with the rendering network. Moreover, most existing datasets are designed for unique camera, we create a new dataset that includes four different styles of multi-camera acquisition systems, allowing readers to generate custom datasets. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we adopt up to 110 images with 110 different intrinsic and extrinsic parameters, to achieve 3D scene representation without providing initial poses.
Overview of the proposed method. First, based on the designed process (as shown in Fig. 5), we obtain two sets of auxiliary images containing AprilTags, named P ack1 and P ack2. These image sets provide 3D coordinates in the world coordinate system and feature point coordinates in the images, which are utilized to establish reprojection loss functions. Second, the camera’s intrinsic and extrinsic parameters, which are constrained by the reprojection loss function, are used to generate sampling points. These points are then fed into a Multilayer Perceptron (MLP) that employs BARF-based progressive encoding for subsequent training.
BARF | L2G-NeRF | MC-NeRF(ours) | MC-NeRF(w/o) | Ground_truth |
HalfBall_Style_L2G | Ball_Style_L2G | Room_Style_L2G | Array_Style_L2G |
HalfBall_Style_BARF | Ball_Style_BARF | Room_Style_BARF | Array_Style_BARF |
NeRF | BARF | L2G_NeRF | MC-NeRF(ours) |
The GIFs below illustrate changes in camera poses and field of view (FOV) across four different scenarios in the test set. Each animation showcases variations in camera pose from top, right, and front views, while FOV changes are demonstrated through adjustments in camera focal length.The fixed coordinate system at the image center symbolizes the world, whereas the mobile coordinate system denotes the camera, with the z-axis in blue, the x-axis in red, and the y-axis in green. The color of the moving sphere and the zoom lens camera model changes synchronously with the field of view (FOV).
Ball Style | HalfBall Style |
Room Style | Array Style |
@misc{gao2023mcnerf,
title={MC-NeRF: Muti-Camera Neural Radiance Fields for Muti-Camera Image Acquisition Systems},
author={Yu Gao and Lutong Su and Hao Liang and Yufeng Yue and Yi Yang and Mengyin Fu},
year={2023},
eprint={2309.07846},
archivePrefix={arXiv},
primaryClass={cs.CV}
}