6

I try to create a Point Cloud based on the images from the KITTI stereo images dataset so then later I could estimate 3D position of some objects.

Original images looks like this.

What I have so far:

  1. generated disparity with cv2.StereoSGBM_create
window_size = 9
minDisparity = 1
stereo = cv2.StereoSGBM_create(
    blockSize=10,
    numDisparities=64,
    preFilterCap=10,
    minDisparity=minDisparity,
    P1=4 * 3 * window_size ** 2,
    P2=32 * 3 * window_size ** 2
)
  1. calculated Q matrix with cv2.stereoRectify using data from KITTI calibration files.
# K_xx: 3x3 calibration matrix of camera xx before rectification
K_L = np.matrix(
    [[9.597910e+02, 0.000000e+00, 6.960217e+02],
     [0.000000e+00, 9.569251e+02, 2.241806e+02],
     [0.000000e+00, 0.000000e+00, 1.000000e+00]])
K_R = np.matrix(
    [[9.037596e+02, 0.000000e+00, 6.957519e+02],
     [0.000000e+00, 9.019653e+02, 2.242509e+02],
     [0.000000e+00, 0.000000e+00, 1.000000e+00]])

# D_xx: 1x5 distortion vector of camera xx before rectification
D_L = np.matrix([-3.691481e-01, 1.968681e-01, 1.353473e-03, 5.677587e-04, -6.770705e-02])
D_R = np.matrix([-3.639558e-01, 1.788651e-01, 6.029694e-04, -3.922424e-04, -5.382460e-02])

# R_xx: 3x3 rotation matrix of camera xx (extrinsic)
R_L = np.transpose(np.matrix([[9.999758e-01, -5.267463e-03, -4.552439e-03],
                              [5.251945e-03, 9.999804e-01, -3.413835e-03],
                              [4.570332e-03, 3.389843e-03, 9.999838e-01]]))
R_R = np.matrix([[9.995599e-01, 1.699522e-02, -2.431313e-02],
                 [-1.704422e-02, 9.998531e-01, -1.809756e-03],
                 [2.427880e-02, 2.223358e-03, 9.997028e-01]])

# T_xx: 3x1 translation vector of camera xx (extrinsic)
T_L = np.transpose(np.matrix([5.956621e-02, 2.900141e-04, 2.577209e-03]))
T_R = np.transpose(np.matrix([-4.731050e-01, 5.551470e-03, -5.250882e-03]))

IMG_SIZE = (1392, 512)

rotation = R_L * R_R
translation = T_L - T_R

# output matrices from stereoRectify init
R1 = np.zeros(shape=(3, 3))
R2 = np.zeros(shape=(3, 3))
P1 = np.zeros(shape=(3, 4))
P2 = np.zeros(shape=(3, 4))
Q = np.zeros(shape=(4, 4))

R1, R2, P1, P2, Q, validPixROI1, validPixROI2 = cv2.stereoRectify(K_L, D_L, K_R, D_R, IMG_SIZE, rotation, translation,
                                                                  R1, R2, P1, P2, Q,
                                                                  newImageSize=(1242, 375))

The resulting matrix look like this (at this point I have a doubt that it is correct):

[[   1.            0.            0.         -614.37893072]
 [   0.            1.            0.         -162.12583194]
 [   0.            0.            0.          680.05186262]
 [   0.            0.           -1.87703644    0.        ]]
  1. Generated Point Cloud with reprojectImageTo3D which looks like this: point cloud

And now the questions part begins :)

  1. Is it OK that all values returned by reprojectImageTo3D are negative?
  2. What are the units of those values, taking into account that it is the KITTI dataset and their camera calibration data is available?
  3. And finally, is it possible to convert those values to something like longitude\latitude if I have GPS coordinate of the camera that took those photos?

Would be appreciated for any help!

EugeneB
  • 61
  • 4

1 Answers1

0
  1. Is it OK for all values returned by reprojectImageTo3D to be negative?

Generally speaking, no, at least for Z values. The values returned by reprojectImageTo3D are real-world coordinates relative to the camera origin, so for a Z value to be negative it means the point is behind the camera (which is geometrically incorrect). The X and Y values can be negative, since the camera origin is at the center of the FOV, so a negative X value means the point is "to the left" and a negative Y value means the point is "below". But for Z values, no, they should not be negative.

Your Q matrix is turning out almost the identity, since I think you are incorrectly setting up the rotation matrices in your call to stereoRectify. When you pass rotation and translation, that is the single rotation from camera 1 to camera 2, not the combined rotation from camera 1 to camera 2. What you are doing is multiplying the two rotations together after transposing one of them; instead you should be passing only R_L (since from your description I assume this means that it is the rotation from left to right camera).

  1. What are the units of those values, taking into account that it is the KITTI dataset and their camera calibration data is available?

I am not familiar with the KITTI dataset, but the values returned after calling reprojectImageTo3D are in real-world units, typically meters.

  1. And finally, is it possible to convert those values to something like longitude\latitude if I have GPS coordinate of the camera that took those photos?

The coordinates returned by reprojectImageTo3D are in real-world coordinates relative to the camera origin. If you have the GPS coordinate of the camera that took the photos, you can manipulate the latitude/longitude values with the (X, Y, Z) coordinates returned from the reprojection.

mprat
  • 2,451
  • 15
  • 33