In most 3D graphics a point is represented by a 4-component vector (x, y, z, w), where w = 1. Usual operations applied on a point include translation, scaling, rotation, reflection, skewing and combination of these.
These transformations can be represented by a mathematical object called "matrix". A matrix applies on a vector like this:
[ a b c tx ] [ x ] [ a*x + b*y + c*z + tx*w ]
| d e f ty | | y | = | d*x + e*y + f*z + ty*w |
| g h i tz | | z | | g*x + h*y + i*z + tz*w |
[ p q r s ] [ w ] [ p*x + q*y + r*z + s*w ]
For example, scaling is represented as
[ 2 . . . ] [ x ] [ 2x ]
| . 2 . . | | y | = | 2y |
| . . 2 . | | z | | 2z |
[ . . . 1 ] [ 1 ] [ 1 ]
and translation as
[ 1 . . dx ] [ x ] [ x + dx ]
| . 1 . dy | | y | = | y + dy |
| . . 1 dz | | z | | z + dz |
[ . . . 1 ] [ 1 ] [ 1 ]
One of the reason for the 4th component is to make a translation representable by a matrix.
The advantage of using a matrix is that multiple transformations can be combined into one via matrix multiplication.
Now, if the purpose is simply to bring translation on the table, then I'd say (x, y, z, 1) instead of (x, y, z, w) and make the last row of the matrix always [0 0 0 1]
, as done usually for 2D graphics. In fact, the 4-component vector will be mapped back to the normal 3-vector vector via this formula:
[ x(3D) ] [ x / w ]
| y(3D) ] = | y / w |
[ z(3D) ] [ z / w ]
This is called homogeneous coordinates. Allowing this makes the perspective projection expressible with a matrix too, which can again combine with all other transformations.
For example, since objects farther away should be smaller on screen, we transform the 3D coordinates into 2D using formula
x(2D) = x(3D) / (10 * z(3D))
y(2D) = y(3D) / (10 * z(3D))
Now if we apply the projection matrix
[ 1 . . . ] [ x ] [ x ]
| . 1 . . | | y | = | y |
| . . 1 . | | z | | z |
[ . . 10 . ] [ 1 ] [ 10*z ]
then the real 3D coordinates would become
x(3D) := x/w = x/10z
y(3D) := y/w = y/10z
z(3D) := z/w = 0.1
so we just need to chop the z-coordinate out to project to 2D.