1Beijing Institute of Technology †Corresponding Author
Neural Radiance Fields (NeRF) use multi-view images for 3D scene representation, demonstrating remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods assume a unique camera and rarely consider multi-camera scenarios. Besides, some NeRF methods that can optimize intrinsic and extrinsic parameters still remain susceptible to suboptimal solutions when these parameters are poor initialized. In this paper, we propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF. The method also supports each image corresponding to independent camera parameters. First, we tackle coupling issue and the degenerate case that arise from the joint optimization between intrinsic and extrinsic parameters. Second, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Finally, we present an end-to-end network with training sequence that enables the estimation of intrinsic and extrinsic parameters, along with the rendering network. Furthermore, recognizing that most existing datasets are designed for a unique camera, we construct a real multi-camera image acquisition system and create a corresponding new dataset, which includes both simulated data and real-world captured images. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we use multi-cameras, each with different intrinsic and extrinsic parameters in real-world system, to achieve 3D scene representation without providing initial poses.
Overview of the proposed method. First, based on the designed process (as shown in Fig. 5), we obtain two sets of auxiliary images containing AprilTags, named P ack1 and P ack2. These image sets provide 3D coordinates in the world coordinate system and feature point coordinates in the images, which are utilized to establish reprojection loss functions. Second, the camera’s intrinsic and extrinsic parameters, which are constrained by the reprojection loss function, are used to generate sampling points. These points are then fed into a Multilayer Perceptron (MLP) that employs BARF-based progressive encoding for subsequent training.
HalfBall_Style_L2G | Ball_Style_L2G | Room_Style_L2G | Array_Style_L2G |
HalfBall_Style_BARF | Ball_Style_BARF | Room_Style_BARF | Array_Style_BARF |
BARF | L2G-NeRF | MC-NeRF(ours) | MC-NeRF(w/o) | Ground_truth |
NeRF | BARF | L2G_NeRF | MC-NeRF(ours) |
The GIFs below illustrate changes in camera poses and field of view (FOV) across four different scenarios in the test set. Each animation showcases variations in camera pose from top, right, and front views, while FOV changes are demonstrated through adjustments in camera focal length.The fixed coordinate system at the image center symbolizes the world, whereas the mobile coordinate system denotes the camera, with the z-axis in blue, the x-axis in red, and the y-axis in green. The color of the moving sphere and the zoom lens camera model changes synchronously with the field of view (FOV).
Ball Style | HalfBall Style |
Room Style | Array Style |
@misc{gao2024mcnerf,
title={MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems},
author={Yu Gao and Lutong Su and Hao Liang and Yufeng Yue and Yi Yang and Mengyin Fu},
year={2024},
eprint={2309.07846},
archivePrefix={arXiv},
primaryClass={cs.CV}
}