3

I'm trying to understand the projection matrix created with glFrustum() in OpenGL, and the transformations that bring it into the Normalized Device Coordinates of x=[-1,1], y=[-1,1], and z=[-1,1] and the series of 4x4 matrix multiplications that take place resulting in the projection matrix of

I understand that the final results (in NDC) are obtained by dividing by the w component after the successive transformations are applied, but what are those successive transformations?

That is what are the matrices T (in terms of near, far, left, right, top, down variables) in the expression

Such that each T represents only a scale, translate, rotation, or shear operation/transformation?

I read a somewhat related post about Is OpenGL LH or RH, and the Projection Matrix tutorial, but I'm still lost as to the elementary operations (i.e. The scale, translate, rotation, or shear operations) taking place.

Community
  • 1
  • 1
WoodMath
  • 521
  • 2
  • 8
  • 14

1 Answers1

6

The perspective transformation matrix cannot be decomposed into only scale, translate rotate and shear operations. Those operations are all affine, while the perspective transformation is projective (which is not affine; in particular, the perspective will not preserve parallelity of lines).

The "target volume" is an axis-aligned cube in normalized device space. In standard OpenGL, this is going from -1 to 1 in all dimensions (Direct3D, and also Vulkan, use a slightly different convention: [-1,1] for x and y, but [0,1] for z. This means the third row of the matrix will look a bit different, but the concepts are the same).

The projection matrix is constructed such that the pyramid frustum is transformed into that normalized volume. OpenGL also used the convention that in eye space, the camera is oriented towards -z. To create the perspective effect, you simply have to project each point onte a plane, by intersecting the ray conneting the projection center and the point in question with the actual viewing plane.

The above perspective matrix assumes that the image plane is parallel to the xy plane (if you want a different one, you could apply some affine rotation). In OpenGL eye space, the projection center is always at the origin. When you do the math, you will see that the perspective boils down to a simple instance of the intercept theorem. If you want to project all points to a plane which is 1 unit in front of the center ("camera"), you end up with dividing x and y by -z.

This could be written in matrix form as

(  1   0   0   0 )
(  0   1   0   0 )
(  0   0   1   0 )
(  0   0  -1   0 )

It works by setting w_clip = -z_eye, so when the division by clip space w is carried out, we get:

 x_ndc = x_clip / w_clip = - x_eye / z_eye
 y_ndc = y_clip / w_clip = - y_eye / z_eye

Note that this also applied to z:

 z_ndc = z_clip / w_clip = - z_eye / z_eye = 1

Such a matrix is typically not used for rendering, because the depth information is lost - all points are actually porjected onto a single plane. Usually, we want to preserve depth (maybe in some non-linearily deviated way).

To do this, we can tweak the formula for z (third row). Since we do not want any dependency of z on x and y, there is only the last element we can tweak. By using a row of the form (0 0 A B), we get the following equation:

z_ndc =  - A * z_eye / z_eye - B / z_eye = -A - B / z_eye

which is just a hyperbolically transformed variant of the eye space z value - depth is still preserved - and the matrix becomes invertible. We just have to calculate A and B.

Let us call the function z_ndc(z_eye) = -A - B / z_eye just Z(z_eye). Since the viewing volume is bounded by z_ndc = -1 (front plane) and z_ndc = 1, and the distances of the near and far plane in eye space are given as parameters, we have to map the near plane z_eye=-n to -1, and the far plane z_eye=-f to 1. To chose A and B, we have to solve a system of 2 (non-linear) equations:

Z(-n) = -1
Z(-f) =  1

This will result in exactly the two coefficients you find in the third row of your matrix.

For x and y, we want to control two things: the field of view angle, and the asymetry of the frustum (which is similiar to the "lens shift" known from projectors). The field of view is defined by the x and y range on the image plane which is mapped to [-1,1] in NDC. So you can imagine just an axis-aligned rectangle on an arbitrary plane parallel to image plane. This rectangle describes the part of the scene which is mapped to the visible viewport, at that chosen distance from the camera. Changing the field of view just means scaling that rectangle in x and y.

And conceptually, the lens shift is just a translation, so you might think it should be put in the last column. However, since the division by w=-z will be carried out after the matrix multiplication, we have to multiply that translation by -z first. This means that the translational part is now in the third column, and we have a matrix of the form

( C   0   D   0 )
( 0   E   F   0 )
( 0   0   A   B )
( 0   0  -1   0 )

For x, this gives:

x_clip = (x_eye * C + D * z_eye ) / (-z_eye) = -x_eye / z_eye - D

Now we just have to find the correct coefficients C and D which will map x_eye=l to x_ndc=-1 and x_eye=r to x_ndc=1. Note that the classical GL frustum function interprets the values of l and r here as distances on the near plane, so we have to calculate all this for z_eye=-n. Solving that new system of 2 equations, you will lead to those coefficients you see in the frustum matrix.

derhass
  • 43,833
  • 2
  • 57
  • 78
  • 2
    "*in particular, the perspective will not preserve parallelity of lines*" If you want to be technical, the transformation is perfectly linear... in a 4-dimensional space. It's the division-by-W that is the non-linear part that makes lines no longer parallel. Notably, that's the part that is done after the matrix. – Nicol Bolas Mar 05 '16 at 00:47
  • @derhass : From your lines that read `z_eye=-n` and `z_eye=-f` it was a little hard to tell whether you mean for `n` and `f` to be less than 0 or for `-n` and `-f` to be less than 0? That is are `n` and `f` given as positive numbers or as negative numbers? My understanding is they are supposed to be negative numbers but I'm having trouble reconciling that with positive numbers given for `n` and `f` in the explanation answer at http://stackoverflow.com/questions/25584667/why-do-i-divide-z-by-w-in-a-perspective-projection-in-opengl – WoodMath Apr 11 '16 at 22:43
  • By convention, `n` and `f` are given as the _distance_ of the near and far plane in the the view direction), so they are always positive. SInce the view direction is usually `-z_eye`, those planes lie at negative `z_eye` values. – derhass Apr 11 '16 at 22:59
  • @WoodMath: i think it is misleading to look at things this way. Especially, z=+/-1 is nothing particular special _before_ the division takes place. The near and far planes are mapped to -w and +w respectively. After all, you have an additional degree of freedom in 4d homogenous space, points will be represented by lines, and planes (including the clipping planes) by hyperplanes - actual three-dimensional volumes. So yes, you can decompose such a matrix into simple affine parts, namely just scaling operations and rotations, you won't even need translations, but all with respect to a 4d space. – derhass Apr 24 '16 at 21:58
  • @derhass : So when `n` and `f` are given as positive numbers is `n` more positive than `f`(i.e. is `n` greater than `f`)?. I would think that since we are looking down the negative _z_-axis `n` would indeed be _more positive_. However the diagram on [Songho.ca] (http://www.songho.ca/opengl/gl_projectionmatrix.html) shows that the `n` and `f` are negated from their positive quantities and since `-f` is the _most negaive_ this would imply the original positive `f` is greater than `n` (i.e. `f` is greater than `n`). This would seem to be counter-intuitive. – WoodMath Apr 24 '16 at 21:59
  • @WoodMath: In the usual case, you select f > n. It is easy to see that when using signed distance values (and we have here a signed distance with the positive direction pointing into the viewing direction of the camera), the greater its _absolute_ value of the distance, the farther away a point is (no matter on which side the point lies). 10 units in front of you (f=10) is farther away than 1 unit in front of you (n=1). Also note that you don't have to set `f >n` in that matrix. You only have to set `f,n > 0` and just `f != n`. If you set `0 < f < n` you just add a flip along the z axis. – derhass Apr 24 '16 at 22:06