Aligning an image to a point cloud

Question

I am extending a 3D webapp that visualizes point clouds, to support images. The app is based on the open source app, Potree (which in turn uses three.js), and allows measuring distances. A demo of the measurement tools in Potree can be viewed here.

My goal is to overlay a photograph onto the point cloud, so that the measurement can be made on the image rather than on the point cloud (note: the actual measurement will still be made on the point cloud. The image overlay acts as a guide and therefore needs to be aligned). To align the photograph, I am using external camera parameters - chiefly for translation and rotation. The pics below show the results I have obtained so far when aligning the image to the 3D-scene. It must be noted that the image has not been corrected for the camera's distortion (internal parameter in the link above). The process to obtain this image can be summed up in the following major steps:

Translate the image object in the scene, to the position where the photograph was taken. Let's call it (X,Y,Z)
Rotate the image object using the Heading/Pitch/Roll angles obtained from the camera orientation
Position the 3D scene's virtual camera at (X,Y,Z) and look at the image object. Use the field of view parameter to adjust how near/far the image should appear.

Point cloud only

Point cloud with image overlaid

The photograph in the obtained result looks roughly aligned, but needs tweaking. The alignment was obtained using only the following parameters/variables which I have or can modify:

Camera orientation (Angles - Heading/ Pitch/ Roll; Position - GPS/Projected coordinates) for the photograph
Camera parameters (Focal Length - 8.5mm in this case, and other manufacturer specified params)
Picture size (5328 x 4608 pixels)
Virtual Camera Pose (within the 3D scene. Includes field of view, position, orientation, near & far distances)

(Note: Using the formula for the field of view, FOV = 2*atan(H/2f) = 2*atan( 4608/( 2 * 8.5)), with the above data gives a field of view of ~180 degrees, which causes the three.js Perspective camera to render the scene incorrectly).

What I do not have are the following:

Depth information in the image (image is not part of a stereo pair)
Control/registration points
Coordinates of identifiable features/objects in the 3d scene to align the image to the scene. I might be able to do this for one or two images, but cannot scale it to hundreds of images.

I am not too well versed in Photogrammetry/Computer Vision and not even sure if I have used the proper terminology in the description above. Other articles/papers I skimmed though speak of image registration via use of control points and/or object matching, which I am unable or unwilling to do, a reason being, they are too computationally expensive to be performed in a Web browser.

All this boils down to my question. With the parameters listed above is it possible to get an alignment that is good enough to take measurements on the photograph? If not, what is missing?

Source code is purposely not included here, since my aim is to develop an understanding of the required theoretical concepts for this particular scenario.

I don't understand the point of overlaying a photo to take measurements. Why don't you use [a raycaster](https://threejs.org/docs/#api/en/core/Raycaster) to perform the measurements on the point cloud directly? — M -, Nov 04 '21 at 17:14
the image overlay clearly doesn't match the point cloud too well. in any case, you'd need to estimate the camera's position relative to the point cloud. you can do that from a few correspondences, i.e. same point in picture and cloud. you can ask the user to pick those, given suitable UI. once you have that... you can project the point cloud "onto" the picture. each point from the cloud brings with it some depth information... that you can then apply to nearby pixels in the picture. — Christoph Rackwitz, Nov 04 '21 at 19:03
your focal length of 8.5 mm has to be accompanies by a pixel pitch of the sensor. example for a Pixel 4a phone's main camera: (focal length 4.38 mm) / (1.4 um pixel pitch) = 3128, which would be the "focal length" as used by OpenCV in its 3x3 camera matrices. assuming 4032 x 3024 pixels, you can arrive at a diagonal FoV of 77.7 degrees. I think you get the math ;) — Christoph Rackwitz, Nov 04 '21 at 19:05
@Marquizzo - actual measurements will be performed on the point cloud. The photo just acts as a guide, since features can be distinguished more clearly. — Imad, Nov 04 '21 at 19:20
2*atan(4608/(2*8.5)) does not make sense because the 4608 is the pixel count, not a height in mm. — Guang, Nov 05 '21 at 19:02
Thanks @Guang for pointing out the mistake. The camera specs reveal the sensor size (mm) to be 14.6 x 12.6. This gives a vertical FoV of 73.09 degrees. — Imad, Nov 09 '21 at 13:37

pk11 · Answer 1 · 2022-06-22T21:25:35.673

The problem you are working is also called Lidar-Camera fusion. You have input data as point cloud, camera intrinsics and extrinsics.

The approach for this problem can be taken in two ways:

Feature based (SIFT/SURF/ORB).
Intensity based - Best suited for multi modal data like point cloud from Laser scanner and RGB image from camera.

Methodology:

Create a synthetic image of the point cloud using the camera projection matrix.
This will give you a pixel location for a 3D point on the point cloud. Take the grayscale value of this pixel location from the RGB image and the reflectivity value from the corresponding 3D point.
Compute the mutual information (MI). this will serve as the error to be minimized.
Using least squares adjustment minimize the error to compute the best estimate of the relative orientation.
Once you have this, you can directly take the RGB values for each projected point. If you have multiple images then for overlapping areas you will need to do blending.

These are some best references to follow:

Aligning an image to a point cloud

1 Answers1