4

I am working on fusing Lidar and Camera images in order to perform a classification object algorithm using CNN.

I want to use the KITTI Dataset which provide synchronized lidar and rgb image data. Lidar are 3D-scanners, so the output is a 3D Point Cloud.

I want to use depth information from point cloud as a channel for the CNN. But I have never work with point cloud so I am asking for some help. Is projecting the point cloud into the camera image plane (using projection matrix provide by Kitti) will give me the depth map that I want? Is Python libray pcl useful or I should move to c++ libraries?

If you have any suggestions, thanks you in advance

Doxcos44
  • 135
  • 2
  • 12

2 Answers2

2

I'm not sure what projection matrix provide by Kitti includes, so the answer is it depends. If this projection matrix only contains a transformation matrix, you cannot generate depth map from it. The 2D image has distortion that comes from the 2D camera and the point cloud usually doesn't have distortion, so you cannot "precisely" map point cloud to rgb image without intrinsic and extrinsic parameters.

PCL is not required to do this.

Depth map essentially is mapping depth value to rgb image. You can treat each point in point cloud(each laser of lider) as a pixel of the rgb image. Therefore, I think all you need to do is finding which point in point cloud corresponding to the first pixel(top left corner) of the rgb image. Then read the depth value from point cloud based on rgb image resolution.

Chengsi Zhang
  • 136
  • 2
  • 5
  • Kitti provide Calibration matrix of RGB Camera With Velodyne LiDAR store as text files. The equations are detailled in this article : http://ww.cvlibs.net/publications/Geiger2013IJRR.pdf. Thanks you for your help, so if I project the point cloud in the image plane using the matrix, then I can generate depth map from it? I don't quite understand how to obtain depth from an image and how to compute it using python – Doxcos44 May 27 '19 at 08:07
  • That's correct. The depth value is the z value in a point cloud and pixel value in a depth map(you can access this value by simply reading the value based on pxiel(row index, column index)). However, since point cloud and rgb image came from different devices(Lidar and rgb camera), the number of lasers(point) in point cloud is not necessarily same as the number of pixels in rgb image. – Chengsi Zhang May 27 '19 at 11:25
  • I think I can handle this sparsity problem using upsampling via Bilateral Filtering. But what are the formulas for the pixel values of depth map? Do you know also how to obtain height maps and reflectance maps from Lidar? It will be useful for my CNN to use them as channels but I don't understand how to obtain such maps from the point cloud format – Doxcos44 May 27 '19 at 13:34
  • To answer your first question: For a 640 × 480 rgb image, it is equivalent to a 640x480x3(3 colour channels) array. For a 640 × 480 depth image, it is equivalent to the 640x480x1(1 depth channel) array. – Chengsi Zhang May 27 '19 at 16:34
  • To answer your second question: Based on my experience, the term of height map is related to depth map. Depth is the distance between a object in point cloud to Lidar, while height is the distance between a terrain and horizon when Lidar is in the sky and facing to the ground ( height_value = Lidar_height - depth). I'm not familiar with reflectance maps, but I do know a Lidar only generates one point when one laser shoots out and comes back to the Lidar after reflection. Maybe reflectance maps is related to the laser intensity? – Chengsi Zhang May 27 '19 at 16:34
  • So depth image is in in black/white format ok. I found the following formulas for the pixel values for depth, reflectance and height channels. Why do they use X instead of Z for the depth value? (https://www.noelshack.com/2019-22-2-1559032007-stack.png) – Doxcos44 May 28 '19 at 08:27
0

You have nothing to do with camera. This is all about point cloud data. Lets say you have 10 million of points and each point has x,y,z in meters. If the data is not in meters first convert it. Then you need the position of the lidar. When you subtract position of car from all the points one by one, you will take the position of lidar to the (0,0,0) point, then you can just print the point on a white image. The rest is simple math, there may be many ways to do it. First that comes to my mind: think rgb as binary numbers. Lets say 1cm is scaled to change in 1 blue, 256cm change equals to change in 1 green and 256x256 which is 65536 cm change equals change in 1 red. We know that cam is (0,0,0) if rgb of the point is 1,0,0 then that means 256x256x1+0x256+0x1=65536 cm away from the camera. This could be done in C++. Also you can use interpolation and closest point algorithms to fill blanks if there are