How to get screen (widget) coordinates from world coordinates

Question

Let's say I have an entity's global (world) coordinate v (QVector3D). Then I make a coordinate transformation:

pos = camera.projectionMatrix() * camera.viewMatrix() * v

where projectionMatrix() and viewMatrix()are QMatrix4x4 instances. What do I actually get and how is this related to widget coordinates?

Tare · Accepted Answer · 2018-03-15T07:45:23.663

1

The following values are for OpenGL. They may differ in other Graphics APIs

You get Clip Space coordinates. Imagine a Cube with side length 2 ( i.e. ~~-1 to 1~~ -w to w on all axes¹). You transform your world to have everything you see with your camera in this cube, so that the graphics card can discard everything outside the cube (since you don't see it, it doesn't need rendering. this is for perforamnce reasons).

Going further, you (or rather your graphics API) would do a perspective divide. Then you are in Normalized Device Space - basically here you go from 3D to 2D, such that you know where in your rendering canvas your pixels have to be colored with whatever lighting calculations you use. This canvas is a quad with side length 1 (I believe).

Afterwards you would stretch these normalized device coordinates with whatever width and height your widget has, such that you know where in your widget the colored pixels go (defined in OpenGL as your Viewport).

What you see as Widget Coordinates are probably the coordinates of where your widget is on screen (usually the upper left corner is specified). Therefore, if your widget coordinate is (10, 10) and you have a rendered pixel in your Viewport transformation at (10, 10), then on screen your rendered pixel would be at (10+10, 10+10).

¹After having had a discussion with derhass (see comments), a lot of books for graphics programming speak of [-1, -1, -1] x [1, 1, 1] as the clipping volume. The OpenGL 4.6 Core Spec however states that it is actually [-w, -w, -w] x [w, w, w] (and according to derhass, it is the same for other APIs. I have not checked this).

edited Mar 15 '18 at 07:45

answered Mar 13 '18 at 16:03

Tare

482
1
9
25

I don't fully agree with your descriptions, you especially mixed clip space and NDC. In clip space, the viewing volume is not the [-1,1] cube, it is [-w,w]. NDC is also not about going from 3D to 2D, it is going from 4D homogeneous to standard 3D, annd NDC and all following spaces still are 3D, with meaningful z coordinates. The depth test relies on it. – derhass Mar 13 '18 at 23:34
It is a cube with [-1, 1]. You transfer your z coordinate into w coordinate of a point, but the clipping cube is still in [-1, 1] on the z-axis (see for example Real-Time Rendering 3rd p. 18). On a logical argument, the range of the cube would be different for any point (if it was [-w, w], which doesn't make sense since you want to clip efficiently (i.e. at -1 and 1). Regarding 4D->3D, your transformations are done with 4x4 Matrices, yes, but your world is still only 3D. There is no 4th dimension in your world just because you use it for translation. – Tare Mar 14 '18 at 07:31
What you say is simply not true. The viewing volume actually **is** [-w,w] in clip space, and that it varies per point is exactly the idea. This is beacuse the fourth dimension `w` is not used only for translation, but for non-affine perspective. Scaling the cube by the z distance is exactly would creates the pyramid frustum in the first place. – derhass Mar 14 '18 at 11:02
Please refer to Real-Time Rendering 3rd, p. 18f.: "After shading, rendering systems perform *projection*, which transforms the view volume into a unit cube with its extreme points at (-1, -1, -1) and (1, 1, 1). [...] The use of a projection matrix means that the transformed primitives are clipped against the unit cube. The advantage of performing the view transformation and projection before clipping is that it makes the clipping problem consistent; primitives are always clipped against the unit cube." – Tare Mar 14 '18 at 15:17
Your quote is only a very rough description, explaining the general idea. In reality, this is achieved by using a 4D homogeneous clip space,where the clip planes are a function of `w`. This is also explicitly stated in the OpenGL spec (same for other APIs). See [GL 4.6 core profile](https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf), section "13.6 Primitive Clipping" (p. 448f.): "In clip coordinates, the view volume is defined by `-w_c <= x_c <= w_c`, `-w_c <= y_c <= w_c`, `z_min <= z_c <= w_c`" (where `z_min` is either `0` or `-w_c`, depending on chosen convention) – derhass Mar 14 '18 at 19:51
So, according to the specs you are correct. I'll edit my answer. I have looked into two more books (Mathematics for 3D Game Programming and Computer Graphics 3rd and Interactive Computer Graphics 6th) and both speak about a clipping volumen of [-1, -1, -1] to [1, 1, 1] and both do it explicitly for OpenGL. I'm sorry, I just don't get why all deviate, especially if they mention the API which's spec says otherwise. – Tare Mar 15 '18 at 07:34
Thank you for the comments. And yes, Qt 3D also uses [-1, -1, -1] x [1, 1, 1] cube as the clipping volume. – Matphy Mar 15 '18 at 11:54
1

@Tare: Well, the `[-1,1]` viewing volume is not incorrect per say, it is just that this is defined in _normalized device space_, which is after the perspective divide by `w` was applied. However, clipping after the divide introduces a lot of problems (see for example [my answer here](https://stackoverflow.com/a/41087445/2327517)), and the more clever solution is just to clip before the divide (hence the name "_clip_ space"), where the view volume will be `w*[-1,1]`. – derhass Mar 15 '18 at 19:34

How to get screen (widget) coordinates from world coordinates

1 Answers1