4

I would like to compare my results of visual Odometry with the groundtruth provided by the KITTI dataset. For each frame in the groundthruth, I have a projection matrix. For example:

1.000000e+00 9.043683e-12 2.326809e-11 1.110223e-16 9.043683e-12 1.000000e+00 2.392370e-10 2.220446e-16 2.326810e-11 2.392370e-10 9.999999e-01 -2.220446e-16

Here the instructions provided by the readme:

Row i represents the i'th pose of the left camera coordinate system (i.e., z pointing forwards) via a 3x4 transformation matrix. The matrices are stored in row aligned order (the first entries correspond to the first row), and take a point in the i'th coordinate system and project it into the first (=0th) coordinate system. Hence, the translational part (3x1 vector of column 4) corresponds to the pose of the left camera coordinate system in the i'th frame with respect to the first (=0th) frame

But I don't know how to produce the same kind of data for me. What I have for each frame in my case:

  • The Tf transformation from the init_camera (the fix one from the (0,0,0)) to the left camera which is moving. So I have the translation vector and the quaternion rotation.
  • The odometry data: the pose and the twist
  • Camera calibration parameters

With those data, How I compare with the groundtruth ? So I need to find the projection matrix from the data above but don't know how to do it.

In a big picture I would like to obtain a projection matrix or to know how to decompose the projections matrix provided by the ground truth in order to compare the transformation with my data.

Can someone help me ?

Thank

lilouch
  • 1,054
  • 4
  • 23
  • 43

2 Answers2

6

Are you sure you want the projection matrix? A camera projection matrix is typically a 3x4 matrix that projects (homogenous) points in R3 to (homogenous) points in R2 in the image plane up to some scale (see the wikipedia entry). It sounds like you are interested in comparing your computed visual odometry to the ground truth odometry provided on the KITTI website; in this case, you would be comparing the rigid transformation matrices from your VO estimation to the KITTI ground truth transformation.

If you are using the "raw" datasets, the "ground truth" is provided as the OXTS data records - i.e. the combined IMU and GPS data. This data is in the global frame, and will take a bit more work to compare to your data. However, it sounds like you are using the odometry benchmark data; the ground truth transformations are already in the frame of the left camera, which should make the task a bit easier (and that is what I'll discuss).

Since you haven't specified a language I'll speak more generally here, but ROS does provide tools in C++ (tf and Eigen) and Python (transformations.py) to perform tasks such as converting from quaternion to rotation matrices...

You have t and q, the translation and rotation represented as a quaternion. You can convert the quaternion to a rotation matrix (usually, 'sxyz' form) and vice versa. The KITTI data is specified as a 3x4 matrix and this is the rotation matrix concatenated with the translation vector (i.e. the 4th column is tgt).

r11 r12 r13 t1
r21 r22 r23 t2
r31 r32 r33 t3

You can simply compute the translation error by computing the L2 norm: ||t - tgt||. Computing the error in the rotation is a little bit more difficult; one way to do it is to use a function such as QuaternionBase::angularDistance() from Eigen, since both measurements should be in the same coordinate frame. To do this, you need to transform the ground truth rotation matrix into a quaternion, using either Eigen or the transformations.py library.

Keep in mind this is the error in the odometry frame - i.e. the error of your i-th estimated pose relative to the i-th ground truth pose in the frame of the initial pose. Sometimes, it is also useful to compare the average error frame-to-frame as well, especially since odometry tends to drift significantly over time, even if the algorithm is relatively accurate between frames on average.

In summary:

  • convert rotation matrices to quaternions to compute angular error (pay attention to the coordinate frames),
  • and use the formula ||t - tgt|| to compute the translation error.
  • again, pay attention to your coordinate frames.
grumpy_robot
  • 141
  • 1
  • 6
  • Thank you @grumpy_robot for this good answer. It's a bit more clear in my mind now. So if i got it well, I don't need to convert my odometry result into a projection matrix ? Just need to convert the ground truth into a rotation and translation, so that I can compute the error of both ? Actually why i wanted projection matrix because they provide a tool in CPP to perform the evaluation with the groundtruth and they tell in the readme "Your submission results must be provided using the same data format." – lilouch Apr 27 '15 at 01:22
  • I don't understqnd well this "Keep in mind this is the error in the odometry frame - i.e. the error of your i-th estimated pose relative to the i-th ground truth pose in the frame of the initial pose. Sometimes, it is also useful to compare the average error frame-to-frame as well" . Can you try to explain me again ? You just mean that the error is in comparison with the first frame, right ? Thank you again ! – lilouch Apr 27 '15 at 01:24
  • @lilouch, yes, you just need to output your odometry information in the same format (i.e. a 3x4 row-major matrix where the "front" 3x3 block is the rotation and the last column is the translation). Glancing at the source code for the devkit, if you provide your output in that form (i.e. a file with N lines (1 per frame) and 12 nums per line), it will correctly compute the error. As for the second question, yes, I essentially mean that the error is in comparison with the first frame, as you say. I've just made many mistakes myself when I haven't paid attention to the correct coordinate frame. – grumpy_robot Apr 27 '15 at 13:23
  • Thank you for the precision @grumpy_robot. Concerning what you said, yes it's a bit tricky because in the ground truth they provide information for each frame, but when i run the algorithm, I don't have for all the frame. I have the time stamp but not sure for the ground truth. I'll try and keep you posted ! – lilouch Apr 28 '15 at 04:13
0

I'm not very practical with computer vision, but have a look at sensor_msgs/CameraInfo Message and image_geometry::PinholeCameraModel Class reference.

It should be somthing like this:

void imageCallback([...], const sensor_msgs::CameraInfoConstPtr& info_msg) {
  // initialize cam_model with sensor_msgs::CameraInfo
  image_geometry::PinholeCameraModel cam_model_;
  cam_model_.fromCameraInfo(info_msg);

  // get the projection matrix
  cv::Matx34d projection_matrix;
  projection_matrix = cam_model_.projectionMatrix();
}
alextoind
  • 1,143
  • 1
  • 13
  • 20
  • Yes I've seen this function. I didn't test it because in my opinion, correct me if I'm mistaken, the projection will be always the same in this case no ? Because I mean the camera info is the same for each frame because of the configuration of the camera. And in each frame, I suppose to have a different projection matrix. Can test. Thank – lilouch Apr 20 '15 at 00:44
  • And because also in my camera_info I have already the projection matrix because of the parameters of the camera. – lilouch Apr 20 '15 at 04:11
  • I'm sorry but as I told you I'm not so practical (it was just a "long comment" write as an answer for clearness). Wait for other answers or try to ask on http://answers.ros.org/. – alextoind Apr 20 '15 at 19:26
  • No Thank anyway Alex for your answer =). – lilouch Apr 21 '15 at 00:50