So I'm trying to understand the fundamentals of perspective projection for 3D graphics and I'm getting stuck. I'm trying to avoid matrices at the moment to try and make things easier for understanding. This is what I've come up with so far:
First I imagine I have a point coming in with screen (pixel) coordinates of x: 200, y: 600, z: 400
. The z amount in this context represents the distance, in pixels, from the projection plane or monitor (this is just how I'm thinking of it). I also have a camera that I'm saying is 800 pixels from the projection plane/monitor (on the back side of the projection plane/monitor), so that acts as the focal length of the camera.
From my understanding, first I find the total z distance of the point 200, 600 by adding its z to the camera's focal length (400 + 800), which gives me a total z distance of 1200. Then, if I wanted to find the projected point of these coordinates I just need to multiply each coordinate (x & y) by (focal_length/z_distance) or 800/1200 which gives me the projected coordinates x: 133, y: 400
.
Now, from what I understand, openGL expects me to send my point down in clips space (-1 to 1) so I shouldn't send my pixel values down as 200, 600. I would have to normalize my x and y coordinates to this -1 to 1 space first. So I normalize my x & y values like so:
xNorm = (x / (width/2)) - 1;
yNorm = (y / (height/2)) - 1;
This gives me normalized values of x: -.6875, y: -.0625
. What I'm unsure of is what my Z would need to be if openGL is going to eventually divide these normalized values by it. I know aspect ratio probably needs to be entered into the equation but not sure how.