2

In camera imaging, there are several terms for point coordinates.

World coordinates: [X, Y, Z] in physical unit

Image coordinates: [u, v] in pixel.

Do these coordinates become homogeneous coordinates by appending with a 1? Sometimes in books and paper it is represented by [x, y w]. When is w is used? When is 1 used?

In the function initUndistortRectifyMap, http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#void%20initUndistortRectifyMap(InputArray%20cameraMatrix,%20InputArray%20distCoeffs,%20InputArray%20R,%20InputArray%20newCameraMatrix,%20Size%20size,%20int%20m1type,%20OutputArray%20map1,%20OutputArray%20map2)

the following process is applied

enter image description here

Is there one term for the coordinates [x y 1]? I don't understand why R can be applied to [x y 1]? In my view, R is the transformation in 3D. Is [x y 1] is one 2d point or one 3d point?

[u v]->[x y]->[x y 1]->[X Y W]->[x' y'] The coordinates are processed according to the above chain. What is the principle behind it?

Jogging Song
  • 573
  • 6
  • 28

1 Answers1

11

In 2-D perspective geometry, there are two main sets of coordinates; Cartesian coordinates (x,y) and homogeneous coordinates which are represented by a triple (x,y,z). This triple can be confusing---it's not a point in three dimensions like the Cartesian (x,y,z). Because of this, some authors use a different notation for homogeneous points, like [x,y,z] or (x:y:z), and this notation makes more sense for reasons we'll get into later.

The third coordinate exists for one purpose only, and that is to add some points to the domain, namely, points at infinity. For the double (x,y), there is no way to represent infinity, at least not with numbers and in ways that we can manipulate easily. But this is a problem for computer graphics since parallel lines are of course very prevalent, and an axiom of Euclidean geometry is that parallel lines meet at infinity. And parallel lines are important as the transformations that are used in computer graphics are line preserving. When we distort points with a homography or affine transformation, we move pixels in a way that maps lines to other lines. If those lines happen to be parallel like they would be in a Euclidean or affine transformation, the coordinate system we use needs to be able to represent that.

So we use homogeneous coordinates (x,y,z) for the sole purpose of including those points at infinity, which are represented by the triple (x,y,0). And since we can put a zero in this place for every Cartesian pair, it's like we have a point at infinity in every single direction (where the direction is given by the angle to that point).

But then, since we have the third value, which can be also any other number other than zero, what are all these additional points? What is the difference between (x,y,2) and (x,y,3) and so on? If the points (x,y,2) and (x,y,3) aren't points at infinity, they better be equal to some other Cartesian points. And luckily, there's a really simple way to map all these homogeneous triples to Cartesian pairs in a way that's nice: simply divide by the third coordinate. Then (x,y,3) gets mapped back into the Cartesian (x/3, y/3), and mapping (x,y,0) to Cartesian is undefined---which is perfect since that point at infinity doesn't exist in Cartesian coordinates.

Because of this scaling factor, that means that homogeneous coordinates can be represented an infinite number of ways. You can map the Cartesian point (x,y) to (x,y,1) in homogeneous coordinates, but you can also map (x,y) to (2x, 2y, 2). Note that if we divide by the third coordinate to go back to Cartesian coordinates, we end up with the same starting point. And that is true in general when you multiply by any non-zero scalar. So the idea is Cartesian coordinates are represented uniquely by a single pair of values, whereas homogeneous coordinates can be represented an infinite amount of ways. This is why some authors use [x,y,z] or (x:y:z). The square bracket is often used in mathematics to define an equivalence relation, and for homogeneous coordinates, [x,y,z]~[sx,sy,sz] for non-zero s. And similarly, : is usually used as a ratio, so the ratio of the three points will be equivalent with any scalar s multiplying them. So whenever you want to transform from homogeneous coordinates to Cartesian, simply divide by the last number as it acts like a scaling factor, and then just pull off the (x,y) values. See my answer here for example.

So the simple way to move into homogeneous coordinates is to append a 1, but really, you could append a 1 and then multiply by any scalar; you wouldn't change anything. You could map (x,y) to (5x,5y,5), apply your transformation (sx',sy',s) = H * (5x,5y,5), and then obtain your Cartesian points as (sx',sy')/s = (x',y') all the same.

alkasm
  • 22,094
  • 5
  • 78
  • 94
  • Thanks for valuable answer. In the figure, [u v] is image coordinates in image plane. Can it be appended by 1? [x y 1] is a 2d point or a 3d point? Why can [x y 1] be multiplied with R? R is a 3d transformation matrix. – Jogging Song Jul 03 '17 at 01:05
  • [This link might explain a bit](https://www.mathworks.com/help/vision/ug/camera-calibration.html). I might edit my answer to discuss this more---sorry I guess I somewhat evaded your main question, I didn't realize the transformations from 2-D into 3-D and back was the main point of confusion. – alkasm Jul 03 '17 at 01:48
  • From the link, there are three kinds of coordinates: world coordinates, camera coordinates, image coordinates. What should we call [x y 1]? I still don't understand [x y 1] is a 2d point or a 3d point. Why can [x y 1] be multiplied with R? R is a 3d transformation matrix. – Jogging Song Jul 05 '17 at 00:43
  • lol professors answer. My answer is to make the coordinate system comfort to the matrix operation. Just like you do fancy thing to state transition matrix in EKF. Typically seen in MSCKF, they have to change the shit to complex form to conform the data. – Dr Yuan Shenghai May 22 '19 at 03:31