Projection of 3D Lidar point in the i-th camera image (KITTI Dataset)

Question

I am working on an object classification problem, and I am using Lidar and camera data from the Kitti Dataset. In this article: http://www.cvlibs.net/publications/Geiger2013IJRR.pdf, they provide the formulas for projecting the 3d PointCloud into the i-th camera image plane, but I don't understand some things :

Following equation((3) :

If the 3D point X is in velodyne camera image and Y in the i'th camera image, why X has four coordinates and Y three? It should have been 3 and 2, no?

_{(source: noelshack.com)}

I need to project the 3D point Cloud into the camera image plane for then creating lidar images to use them as a channel for the CNN. Anyone who has ideas for it ?

Thank you in advance

score 1 · Accepted Answer · answered May 29 '19 at 15:55

For your first query regarding x and y dimension there are two explanation.

Reason 1.

For image re-projection pin hole camera model is used which is in perspective coordinate or homogenous coordinate. Perspective projection uses the image origin as centre of projection and points are mapped to the plane z=1. A 3D point [x y z] is represented by [xw yw zw w] and the point it maps on the plane is represented by [xw yw zw]. Normalising with w gives.

So (x,y) -> [x y 1]^T : Homogeneous Image Coordinates

and (x,y,z) - > [x y z 1] ^T : Homogeneous Scene Coordinates

Reason 2.

With respect to the paper you have attached, considering equation (4) and (5)

It is clear that P is of dimension 3X4 and R is expanded to 4x4 dimension.Also x is of dimension 1x4. So as per matrix multiplication rule number of columns of first matrix must equal to the number of rows of second matrix. So for given P of 3x4 and R of 4x4, x has to be 1x4.

Now coming to your second question of LiDAR image fusion, It requires intrinsic and extrinsic parameters (relative rotation and translation) and camera matrix. This rotation and translation forms a 3x4 matrix called as transformation matrix. So the point fusion equations becomes

[x y 1]^T = Transformation Matrix * Camera Matrix * [X Y Z 1]^T

You can also refer :: Lidar Image Fusion KITTI

Once your LiDAR image fusion is done, you can input this image to your CNN model.I am not aware of DNN modules for LiDAR fused image.

Hope this helps..

Thansks you very much for your help. More clear now for me. For my second question, I have already managed to obtain the 3d point coordinates in pixel coordinales, but which pixel values should I choose for my lidar image? — Doxcos44, May 31 '19 at 11:55
Here is a image that a obtain by projecting the pointCloud into the camera image plane, then coloring the point using some criteria based on Z values of the pointCloud (depth) : ![lidar](https://image.noelshack.com/fichiers/2019/22/5/1559304027-stack2.png) — Doxcos44, May 31 '19 at 12:02

Projection of 3D Lidar point in the i-th camera image (KITTI Dataset)

1 Answers1