Trying to understand the math behind the perspective matrix in WebGL

Question

All matrix libraries for WebGL have some sort of perspective function that you call to get the perspective matrix for the scene.
For example, the perspective method within the mat4.js file that's part of gl-matrix is coded as such:

mat4.perspective = function (out, fovy, aspect, near, far) {
    var f = 1.0 / Math.tan(fovy / 2),
        nf = 1 / (near - far);
    out[0] = f / aspect;
    out[1] = 0;
    out[2] = 0;
    out[3] = 0;
    out[4] = 0;
    out[5] = f;
    out[6] = 0;
    out[7] = 0;
    out[8] = 0;
    out[9] = 0;
    out[10] = (far + near) * nf;
    out[11] = -1;
    out[12] = 0;
    out[13] = 0;
    out[14] = (2 * far * near) * nf;
    out[15] = 0;
    return out;
};

I'm really trying to understand what all the math in this method is actually doing, but I'm tripping up on several points.

For starters, if we have a canvas as follows with an aspect ratio of 4:3, then the aspect parameter of the method would in fact be 4 / 3, correct?

4:3 aspect ratio

I've also noticed that 45° seems like a common field of view. If that's the case, then the fovy parameter would be π / 4 radians, correct?

With all that said, what is the f variable in the method short for and what is the purpose of it?
I was trying to envision the actual scenario, and I imagined something like the following:

Side view of [perspective in 3D scene

Thinking like this, I can understand why you divide fovy by 2 and also why you take the tangent of that ratio, but why is the inverse of that stored in f? Again, I'm having a lot of trouble understanding what f really represents.

Next, I get the concept of near and far being the clipping points along the z-axis, so that's fine, but if I use the numbers in the picture above (i.e., π / 4, 4 / 3, 10 and 100) and plug them into the perspective method, then I end up with a matrix like the following:

enter image description here

Where f is equal to:

enter image description here

So I'm left with the following questions:

What is f?
What does the value assigned to out[10] (i.e., 110 / -90) represent?
What does the -1 assigned to out[11] do?
What does the value assigned to out[14] (i.e., 2000 / -90) represent?

Lastly, I should note that I have already read Gregg Tavares's explanation on the perspective matrix, but after all of that, I'm left with the same confusion.

Maybe [this link](http://www.songho.ca/opengl/gl_projectionmatrix.html) helps a bit. This is referencing outdating fixed-function GL a tiny bit, but the math is still valid. — derhass, Feb 02 '15 at 20:24
Sorry, derhass, but that link was even more confusing than all the other links I've looked at thus far. I guess what I'm asking for more than a math explanation is a conceptual explanation of what's happening, and how the matrix is formed, given the actual scenario. — HartleySan, Feb 02 '15 at 20:40

gman · Accepted Answer · 2015-02-12T03:52:11.790

Let's see if I can explain this, or maybe after reading this you can come up with a better way to explain it.

The first thing to realize is WebGL requires clipspace coordinates. They go -1 <-> +1 in x, y, and z. So, a perspective matrix is basically designed to take the space inside the frustum and convert it to clipspace.

If you look at this diagram

frustum-side

we know that tangent = opposite (y) over adjacent(z) so if we know z we can compute y that would be sitting at the edge of the frustum for a given fovY.

tan(fovY / 2) = y / -z

multiply both sides by -z

y = tan(fovY / 2) * -z

if we define

f = 1 / tan(fovY / 2)

we get

y = -z / f

note we haven't done a conversion from cameraspace to clipspace. All we've done is compute y at the edge of the field of view for a given z in cameraspace. The edge of the field of view is also the edge of clipspace. Since clipspace is just +1 to -1 we can just divide a cameraspace y by -z / f to get clipspace.

Does that make sense? Look at the diagram again. Let's assume that the blue z was -5 and for some given field of view y came out to +2.34. We need to convert +2.34 to +1 clipspace. The generic version of that is

clipY = cameraY * f / -z

Looking at `makePerspective'

function makePerspective(fieldOfViewInRadians, aspect, near, far) {
  var f = Math.tan(Math.PI * 0.5 - 0.5 * fieldOfViewInRadians);
  var rangeInv = 1.0 / (near - far);

  return [
    f / aspect, 0, 0, 0,
    0, f, 0, 0,
    0, 0, (near + far) * rangeInv, -1,
    0, 0, near * far * rangeInv * 2, 0
  ];
};

we can see that f in this case

tan(Math.PI * 0.5 - 0.5 * fovY)

which is actually the same as

1 / tan(fovY / 2)

Why is it written this way? I'm guessing because if you had the first style and tan came out to 0 you'd divide by 0 your program would crash where is if you do it the this way there's no division so no chance for a divide by zero.

Seeing that -1 is in matrix[11] spot means when we're all done

matrix[5]  = tan(Math.PI * 0.5 - 0.5 * fovY)
matrix[11] = -1

clipY = cameraY * matrix[5] / cameraZ * matrix[11]

For clipX we basically do the exact same calculation except scaled for the aspect ratio.

matrix[0]  = tan(Math.PI * 0.5 - 0.5 * fovY) / aspect
matrix[11] = -1

clipX = cameraX * matrix[0] / cameraZ * matrix[11]

Finally we have to convert cameraZ in the -zNear <-> -zFar range to clipZ in the -1 <-> + 1 range.

The standard perspective matrix does this with as reciprocal function so that z values close the the camera get more resolution than z values far from the camera. That formula is

clipZ = something / cameraZ + constant

Let's use s for something and c for constant.

clipZ = s / cameraZ + c;

and solve for s and c. In our case we know

s / -zNear + c = -1
s / -zFar  + c =  1

So, move the `c' to the other side

s / -zNear = -1 - c
s / -zFar  =  1 - c

Multiply by -zXXX

s = (-1 - c) * -zNear
s = ( 1 - c) * -zFar

Those 2 things now equal each other so

(-1 - c) * -zNear = (1 - c) * -zFar

expand the quantities

(-zNear * -1) - (c * -zNear) = (1 * -zFar) - (c * -zFar)

simplify

zNear + c * zNear = -zFar + c * zFar

move zNear to the right

c * zNear = -zFar + c * zFar - zNear

move c * zFar to the left

c * zNear - c * zFar = -zFar - zNear

simplify

c * (zNear - zFar) = -(zFar + zNear)

divide by (zNear - zFar)

c = -(zFar + zNear) / (zNear - zFar)

solve for s

s = (1 - -((zFar + zNear) / (zNear - zFar))) * -zFar

simplify

s = (1 + ((zFar + zNear) / (zNear - zFar))) * -zFar

change the 1 to (zNear - zFar)

s = ((zNear - zFar + zFar + zNear) / (zNear - zFar)) * -zFar

simplify

s = ((2 * zNear) / (zNear - zFar)) * -zFar

simplify some more

s = (2 * zNear * zFar) / (zNear - zFar)

dang I wish stackexchange supported math like their math site does :(

so back to the top. Our forumla was

s / cameraZ + c

And we know s and c now.

clipZ = (2 * zNear * zFar) / (zNear - zFar) / -cameraZ -
        (zFar + zNear) / (zNear - zFar)

let's move the -z outside

clipZ = ((2 * zNear * zFar) / zNear - ZFar) +
         (zFar + zNear) / (zNear - zFar) * cameraZ) / -cameraZ

we can change / (zNear - zFar) to * 1 / (zNear - zFar) so

rangeInv = 1 / (zNear - zFar)
clipZ = ((2 * zNear * zFar) * rangeInv) +
         (zFar + zNear) * rangeInv * cameraZ) / -cameraZ

Looking back at makeFrustum we see it's going to end up making

clipZ = (matrix[10] * cameraZ + matrix[14]) / (cameraZ * matrix[11])

Looking at the formula above that fits

rangeInv = 1 / (zNear - zFar)
matrix[10] = (zFar + zNear) * rangeInv
matrix[14] = 2 * zNear * zFar * rangeInv
matrix[11] = -1
clipZ = (matrix[10] * cameraZ + matrix[14]) / (cameraZ * matrix[11])

I hope that made sense. Note: Most of this is just my re-writing of this article.

I've been mulling over your answer almost nonstop since you posted it a couple of days ago, and finally, it's starting to come together, although I will admit that I am still a bit confused. I have also look very heavily at [your post](http://games.greggman.com/game/webgl-3d-perspective/) as well as the explanations on the perspective matrix in... — HartleySan, Feb 05 '15 at 08:43
[This book](http://www.amazon.com/WebGL-Programming-Guide-Interactive-Graphics/dp/0321902920/ref=sr_1_1?ie=UTF8&qid=1423125734&sr=8-1&keywords=WebGL) and [this book](http://www.amazon.com/Math-Primer-Graphics-Game-Development/dp/1568817231/ref=sr_1_1?ie=UTF8&qid=1423125771&sr=8-1&keywords=3D+Math) in an effort to better understand what's going on. With all that said, this is my understanding thus far, and please correct if I'm wrong: — HartleySan, Feb 05 '15 at 08:44
`Math.tan(fovy / 2)` is essentially describing the relationship between `y` and `z`. (I mean, after all, that's just the basic trigonometric definition of the tangent.) As such, the inverse of that is equal to `maxZ / maxY`. As such, when you multiple the y part of a vertex by that, the division by `maxY` essentially has the effect of normalizing the y from 0 to 1. From there, the larger the Z, the larger the y value ultimately becomes. Also, I get why you do the same thing for x but also factor in the aspect ratio. That's fine. Now, where I'm still very confused is the z part. For one... — HartleySan, Feb 05 '15 at 08:48
You wrote `zeroToOne = (someY - near) * rangeInv;` above, but I'm wondering if `someY` should in fact be `someZ`. Please let me know your opinion on that. Thank you. Also, I totally get how you calculated `clipspace` to go from `-1` to `1`. That's totally fine. However, I don't understand how that `clipspace` calculation maps to `out[10]` and `out[14]`. Specifically, what does `zNear + zFar` do in `out[10]` and what does `zNear * zFar` do in `out[14]`? Sorry for all the questions. Your post has been extremely helpful, but as you can see, I'm still a bit confused. Thank you. — HartleySan, Feb 05 '15 at 08:51
This is slowly starting to click for me. Lemme mull over it a bit more, and I'll get back with you with a more complete reply. Thanks again, gman. — HartleySan, Feb 10 '15 at 03:43
I re-wrote the entire thing based off an article found and linked to. See if it helps — gman, Feb 11 '15 at 04:14
That was excellent! Thank you so much. I also looked at the article you referenced at the end, and I think I'm set. I'll continue to look over this for the next couple of days to ensure that it sticks (as best it can), but I actually get it now, which is great. Thanks again! — HartleySan, Feb 12 '15 at 03:03
I'm really struggling to understand the line " if we define f = 1 / tan(fovY / 2)" why do we do that? — JHRS, Nov 26 '22 at 14:55

Sneftel · Answer 2 · 2015-02-03T09:12:22.173

0

f is a factor which scales the y-axis, such that all points along the top plane of your viewing frustum, post-perspective-division, have a y-coordinate of 1, and those on the bottom plane have a y-coordinate of -1. Try plugging in points along one of those planes (examples: 0, 2.41, 1, 2, 7.24, 3) and you can see why this happens: because it ends up with the pre-division y equal to the homogeneous w.

edited Feb 03 '15 at 09:12

answered Feb 02 '15 at 20:30

Sneftel

40,271
12
71
104

Sneftel, thanks for your answer. What do you mean by "right plane" and "left plane"? Also, your answer sounds like `f` is merely normalizing everything in relation to the `y` value, but I don't get why and what that has to do with homogenous w. Also, could you please provide some insight into questions #2-#4 above? Thank you so much. – HartleySan Feb 02 '15 at 20:42
Sorry, I should have said "top or bottom" planes. Consider all the points in space which, when rendered, would be drawn as pixels on the extreme top or bottom of the screen. Those form flat surfaces in the world. – Sneftel Feb 03 '15 at 09:12
As for 2 and 4, just as `f` and `f/aspect` scale x and y into the (-1,1) range, those scale z into the (-1, 1) range. And 3 is to set up the perspective division. I think you might need to play around a bit more on paper to understand what's going on here. In particular, see if you can figure out why points with a large z coordinate are drawn closer together than points with a small z coordinate. – Sneftel Feb 03 '15 at 09:15

Trying to understand the math behind the perspective matrix in WebGL

2 Answers2

Linked