4

I've read this answer but I still don't understand the rvec/tvec pair returned by calibrateCamera and the rvec/tvec pair returned by solvePnP.

I understand that solvePnP solves for [R|T] which is given here.

enter image description here

This is very clear - it's an affine transformation from the world points to the image plane. And together, the camera matrix K and [R|T] form K[R|T], the projection matrix.

However, I can't seem to find what the purpose of the rvec/tvec returned by calibrateCamera is.

Carpetfizz
  • 8,707
  • 22
  • 85
  • 146

1 Answers1

4

The rvecs/tvecs returned by calibrateCamera describe how to project from the local coordinate system of each of the checkerboards to your camera's image plane.

This is just like how the rvec/tvec from solvePnP describes how to project from the coordinate system of the points given to it, to your camera's image plane.

You can convert from an rvec into a full R matrix using Rodrigues().

Atnas
  • 594
  • 4
  • 16
  • 1
    Oh, so both of these describe the same transformation `object_points -> image_plane` but `calibrateCamera` returns a list of these for *each* chessboard whereas `solvePnP` will do it for just one - I see. Why is `solvePnP` an optimization problem by the way? Why can't I just rearrange the equation above for `[R|T]` and call it a day? – Carpetfizz Sep 29 '17 at 06:45
  • 1
    You can't just rearrange the equation because you have the equation many times (one for each pair of points). Because the points contain small errors the solution will not solve the problem exactly, but only give the solution that has the smallest distance between the points projected to the camera, and the actual points in the image. – Atnas Sep 29 '17 at 06:51
  • Oh, that makes a lot of sense. And I’m assuming it’s trying to minimize “reprojection error”? What math occurs to compute this? I know there’s an alternative(?) way to solve this problem called DLT. Is DLT a technique used in solvePnP or is it something different altogether? Sorry for asking so many questions in the comments - this stuff is finally starting to make sense and I want to make sure I got a all of it! – Carpetfizz Sep 29 '17 at 06:53
  • 1
    The math is basically: project the points; compute the error; change `rvec`/`tvec` a little, and see if the error gets smaller or bigger. This is handled by an optimization algorithm such as Levenberg-Marquardt. DLT is a very simple way to solve the problem, where you estimate the projection matrix using linear algebra "solving the equation". But in order to recover R|t from that, you multiply by the inverse of the camera matrix, and the quality of the estimate is usually lower than for solvePnP. – Atnas Sep 29 '17 at 07:03
  • Thanks again this is become more and more clear. The "projection matrix" is `K[R|T]` correct? When we measure the reprojection error, I'm assuming we are computing `K[R|T][X Y Z]^T` and comparing it against... "something"? What do we consider the "correct" image points to compare against? – Carpetfizz Sep 29 '17 at 07:10
  • Yes, that's the projection matrix. It's compared against the `imagePoints` which you pass to `solvePnP`, i.e. the locations in your image you believe correspond to the 3d points you have. – Atnas Sep 29 '17 at 07:14
  • Oh, makes a lot of sense. So after it computes `[R|T]`, it creates some `P = K[R|T]`, and then `[X Y Z]^T` will always be fixed(?), so does it just compare `P[X Y Z]^T = [u v 1]^T` to `[x_1 y_1 1]^T` frame by frame, where `[x_1 y_1 1]^T` is the image point I gave it? – Carpetfizz Sep 29 '17 at 07:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155571/discussion-between-atnas-and-carpetfizz). – Atnas Sep 29 '17 at 07:53