In 2-D perspective geometry, there are two main sets of coordinates; Cartesian coordinates (x,y)
and homogeneous coordinates which are represented by a triple (x,y,z)
. This triple can be confusing---it's not a point in three dimensions like the Cartesian (x,y,z)
. Because of this, some authors use a different notation for homogeneous points, like [x,y,z]
or (x:y:z)
, and this notation makes more sense for reasons we'll get into later.
The third coordinate exists for one purpose only, and that is to add some points to the domain, namely, points at infinity. For the double (x,y)
, there is no way to represent infinity, at least not with numbers and in ways that we can manipulate easily. But this is a problem for computer graphics since parallel lines are of course very prevalent, and an axiom of Euclidean geometry is that parallel lines meet at infinity. And parallel lines are important as the transformations that are used in computer graphics are line preserving. When we distort points with a homography or affine transformation, we move pixels in a way that maps lines to other lines. If those lines happen to be parallel like they would be in a Euclidean or affine transformation, the coordinate system we use needs to be able to represent that.
So we use homogeneous coordinates (x,y,z)
for the sole purpose of including those points at infinity, which are represented by the triple (x,y,0)
. And since we can put a zero in this place for every Cartesian pair, it's like we have a point at infinity in every single direction (where the direction is given by the angle to that point).
But then, since we have the third value, which can be also any other number other than zero, what are all these additional points? What is the difference between (x,y,2)
and (x,y,3)
and so on? If the points (x,y,2)
and (x,y,3)
aren't points at infinity, they better be equal to some other Cartesian points. And luckily, there's a really simple way to map all these homogeneous triples to Cartesian pairs in a way that's nice: simply divide by the third coordinate. Then (x,y,3)
gets mapped back into the Cartesian (x/3, y/3)
, and mapping (x,y,0)
to Cartesian is undefined---which is perfect since that point at infinity doesn't exist in Cartesian coordinates.
Because of this scaling factor, that means that homogeneous coordinates can be represented an infinite number of ways. You can map the Cartesian point (x,y)
to (x,y,1)
in homogeneous coordinates, but you can also map (x,y)
to (2x, 2y, 2)
. Note that if we divide by the third coordinate to go back to Cartesian coordinates, we end up with the same starting point. And that is true in general when you multiply by any non-zero scalar. So the idea is Cartesian coordinates are represented uniquely by a single pair of values, whereas homogeneous coordinates can be represented an infinite amount of ways. This is why some authors use [x,y,z]
or (x:y:z)
. The square bracket is often used in mathematics to define an equivalence relation, and for homogeneous coordinates, [x,y,z]~[sx,sy,sz]
for non-zero s
. And similarly, :
is usually used as a ratio, so the ratio of the three points will be equivalent with any scalar s
multiplying them. So whenever you want to transform from homogeneous coordinates to Cartesian, simply divide by the last number as it acts like a scaling factor, and then just pull off the (x,y)
values. See my answer here for example.
So the simple way to move into homogeneous coordinates is to append a 1, but really, you could append a 1 and then multiply by any scalar; you wouldn't change anything. You could map (x,y)
to (5x,5y,5)
, apply your transformation (sx',sy',s) = H * (5x,5y,5)
, and then obtain your Cartesian points as (sx',sy')/s = (x',y')
all the same.