18

I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?

pyrrhic
  • 1,769
  • 2
  • 15
  • 27
Megharaj
  • 1,589
  • 2
  • 20
  • 32
  • 1
    Perhaps you would be better off with that on Maths.se than here :) (IOW OpenGL is completely not related) – Bartek Banachewicz Jun 24 '13 at 07:42
  • http://scratchapixel.com/lessons/3d-advanced-lessons/things-to-know-about-the-cg-lighting-pipeline/ – user18490 Sep 03 '14 at 19:48
  • 1
    Possible duplicate of [Understanding the Projection Matrix](https://stackoverflow.com/questions/6111721/understanding-the-projection-matrix) – Michael IV Aug 28 '17 at 21:26
  • Excellent question. Answers to questions regarding "**why** things work" are so important and I find are often neglected in favor of the *how*. – KeyC0de Oct 12 '19 at 11:25

5 Answers5

18

Some details that complement the general answers:

The idea is to project a point (x,y,z) on screen to have (xs,ys,d). The next figure shows this for the y coordinate.

enter image description here

We know from school that

tan(alpha) = ys / d = y / z

This means that the projection is computed as

ys = d*y/z = y /w

w = z / d

This is enough to apply a projection. However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.

The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped. So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.

a.lasram
  • 4,371
  • 1
  • 16
  • 24
7

I mean why do we need that?

In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • In fact, the simplest perspective matrix you could possibly have just copies z into w, so that when perspective divide occurs x and y get smaller the further away from the camera they are. This video might help https://youtu.be/o1n02xKP138?t=994 – user986730 Jan 13 '21 at 09:50
1

To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.

There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.

Steven Lu
  • 41,389
  • 58
  • 210
  • 364
1

Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates".

Basically you raise the dimension of vectors to R⁴ and matrices to R^4x4. Handling the w-Coordinate and fourth column of the 4x4 matrix properly has convenient properties. When defining the conversion of a homogeneous R⁴ vector back to R³ as a divide by the last component(w) you can simply put the perspective divide into the w-component for perspective projection and leave it to 1 for orthogonal projection. This way you can handle both perspective projection AND orthogonal projection in one type of matrix without needing to treat the underline math differently. Also translation can be put into the 4x4 matrix, which means cheap parallel affine transformations for vectors/coordinates/vertices.

To convert a R⁴ vector back to R³ you divide by the 4th component (or w component as you said). This process is called dehomogenizing.

The topic of homogeneous coordinate is little involved, I try to explain. Homogeneous Coordinates are a major achievement in 3D maths. I hope I do it justice.

However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:

Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x. With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:

x' = x1*a + x2*b + x3*c    (1).  

(x' is the transformed vector that holds the result of transforming x).

Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.

You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ]. So with the matrix product you essentially get the upper linear combination:

x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3    (2).

However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).

For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).

Consider the translation column vector t = (t1, t2, t3) and the linear combination

x' = x1*a + x2*b + x3*c + t    (3). 

With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).

However you can't put this translation into a 3x3 Matrix. So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this: M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.

Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴. This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway). So you have the linear combination:

x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t    (4).

(x~' is the result vector when transforming the homogeneous vector x~) Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:

    x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t        (5).

which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).

So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.

However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.

x' = (x1/x4, x2/x4, x3/x4)    (6). 

Then x' is the dehomogenized vector of the vector x~.

Coming back to perspective division: (I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).

When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).

This is for perspective projection. For orthogonal projection you simply leave the last component as 1. So when dehomogenizing the vector x = (x1/1, x2/1, x3/1) is left as it is. So no perspective divide.

This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.

You notice by putting the perspective divide into the last homogeneous coordinate the division is left in the vector. And not put into the matrix. Which means you can use the SAME matrix for different vectors and still achieve the proper perspective divide for each coordinate. This is what makes 3D accelerator even faster, because of hardware built SIMD parallel instructions. It is a very powerful property. And it handles orthogonal and perspective projection with the exact same pipeline.

I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.

Best of luck!

Kada B
  • 140
  • 1
  • 11
0

Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction. The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.