Need some help to understanding the formula

Question

This is pinhole camera model:

enter image description here

(I don't get, is there [R t], or (R, t)) This formula is translating the 3d coordinates of the point, to the 2d coordinates of a picture, which is obtained by pinhole camera.

Projection drawing:

enter image description here

Tilde over vector, means that "1" is added to that vector as the element. M is the coordinate of the point in the 3d space, and m is the coordinate of the point in the picture, f is the focal length of the camera, аnd s is the pixel aspect ratio. (R, t) describes the 3D transformation between the world coordinate system, in which the rectangle is described, and the camera coordinate system.

It is unclear to me, what is mean [R t] (or (R, t)) after A, and how, by inserting the 3D coordinates (pixel aspect ratio = 1) of the corners to the formula we get this:

enter image description here

And what does "t" letter mean?

I found this formula in here (page 13).

Are you programming a model of this? Or are you trying to understand the mathematical formula? If it's the latter, you should try http://math.stackexchange.com/ — Sir Crispalot, Jan 04 '12 at 14:58
Yes, I'm trying to understand the formula, thanks for the link. — Userr, Jan 04 '12 at 15:05
According to Math.SE mods, this question isn't on-topic there either. Checking with Physics.SE. Either way, though, this is unfortunately entirely off-topic here as well. — Adam Lear, Jan 05 '12 at 20:36
could you point me to the math.se discussion? i'm curious because i can't imagine how it could possibly be seen as offtopic there. — Martin DeMello, Jan 08 '12 at 12:56

score 4 · Answer 1 · answered Jan 04 '12 at 15:43

Not quite, A[R t] is the transform in entirety to bring the image from the camera image to our world, [R t] is a matrix which is multiplied by the matrix A. R is a rotation matrix, and t is a transform matrix, both of which are necessary to describe a camera. A is the matrix that describes the photographic camera in terms of focal length, pixel ratio and center point. The system is trying to solve for [R t].

The formula assumes that the four points of the whiteboard in the picture, given by m, lie on a plane, and thus their co-ordinates exist within a projection space such that for all m, z=0, and m(1).y = m(2).y, m(3).y = m(4).y and m(1).x = m(3).x and m(2).x = m(4).x. You correct the image by applying the physical camera distortion which is defined by focal length, pixel size and translation to m, then determining a transform such that A(m-tilde) ends up as M-tilde, that transform is given by the matrix [R t]. If you follow the article to the end, the formula for calculating the camera [R t] is shown (to a point). It does however also prove that you can't determine the width and height for M based on m, only the aspect ratio, which in the long run for this application is fine as it's mapping from an arbitrary resolution to another arbitrary resolution, and the absolute size is not important.

Thanks, but I am not sure if I fully understand how the formula: (lambda)2*m2 = w*A*r2 + A*t is calculated. If we are substituting the corners coordinates why don't we get: (lambda)2m2 = A * r2 * t *w ? — Userr, Jan 05 '12 at 18:03

Need some help to understanding the formula

1 Answers1