Calculating a LookAt matrix

Question

I'm in the midst of writing a 3d engine and I've come across the LookAt algorithm described in the DirectX documentation:

zaxis = normal(At - Eye)
xaxis = normal(cross(Up, zaxis))
yaxis = cross(zaxis, xaxis)

 xaxis.x           yaxis.x           zaxis.x          0
 xaxis.y           yaxis.y           zaxis.y          0
 xaxis.z           yaxis.z           zaxis.z          0
-dot(xaxis, eye)  -dot(yaxis, eye)  -dot(zaxis, eye)  1

Now I get how it works on the rotation side, but what I don't quite get is why it puts the translation component of the matrix to be those dot products. Examining it a bit it seems that it's adjusting the camera position by a small amount based on a projection of the new basis vectors onto the position of the eye/camera.

The question is why does it need to do this? What does it accomplish?

Take a read of the stuff on http://msdn.microsoft.com/en-au/library/bb206269(VS.85).aspx — Dominik Grabiec, Dec 09 '08 at 06:21
Note this is a [row major, left-handed look at matrix](http://msdn.microsoft.com/en-us/library/bb205342(VS.85).aspx) — bobobobo, Jul 23 '11 at 18:00
is the letter L ("l") in the bottom right supposed to be a one (1) — vidstige, Dec 20 '17 at 19:26
@bobobobo This is a column major matrix, because the translation is at the bottom instead of the right-hand side. "column major" is standard in GLSL. — Crouching Kitten, Jun 21 '20 at 04:39
@CrouchingKitten in a column major transformation matrix the translation components are on the right-hand side - see "__Summary__" at https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/geometry/row-major-vs-column-major-vector — bobobobo, Jun 22 '20 at 14:51
@bobobobo That page assumes that column/row-major are mathematical concepts, but they are not. Instead they tell how a one dimensional array is interpreted as a matrix. OpenGL always interprets an array in column-major way. The code from the OP was copied from a Microsoft page, and it doesn't show a matrix. It shows an array laid out in multiple lines. Column major flips it around (so translation would get to the right in the interpretation of OpenGL), while row-major keeps it as it is written in code, at the bottom. — Crouching Kitten, Jun 22 '20 at 15:54

bobobobo · Answer 1 · 2012-10-13T16:05:13.163

Note the example given is a left-handed, row major matrix.

So the operation is: Translate to the origin first (move by -eye), then rotate so that the vector from eye to At lines up with +z:

Basically you get the same result if you pre-multiply the rotation matrix by a translation -eye:

[      1       0       0   0 ]   [ xaxis.x  yaxis.x  zaxis.x 0 ]
[      0       1       0   0 ] * [ xaxis.y  yaxis.y  zaxis.y 0 ]
[      0       0       1   0 ]   [ xaxis.z  yaxis.z  zaxis.z 0 ]
[ -eye.x  -eye.y  -eye.z   1 ]   [       0        0        0 1 ]

  [         xaxis.x          yaxis.x          zaxis.x  0 ]
= [         xaxis.y          yaxis.y          zaxis.y  0 ]
  [         xaxis.z          yaxis.z          zaxis.z  0 ]
  [ dot(xaxis,-eye)  dot(yaxis,-eye)  dot(zaxis,-eye)  1 ]

Additional notes:

Note that a viewing transformation is (intentionally) inverted: you multiply every vertex by this matrix to "move the world" so that the portion you want to see ends up in the canonical view volume.

Also note that the rotation matrix (call it R) component of the LookAt matrix is an inverted change of basis matrix where the rows of R are the new basis vectors in terms of the old basis vectors (hence the variable names xaxis.x, .. xaxis is the new x axis after the change of basis occurs). Because of the inversion, however, the rows and columns are transposed.

This is the best answer, much more cogent than the currently-accepted answer. — prideout, Feb 28 '16 at 22:07
This would imply that the LookAt matrix is an orthonormal basis otherwise the transpose would not be equal to it's inverse, correct? — John Leidegren, Dec 12 '16 at 08:40
@JohnLeidegren yes the rotation part is orthonormal by construction just for this reason. — eric, Dec 31 '18 at 07:26

Judge Maygarden · Accepted Answer · 2008-12-09T17:31:48.693

20

I build a look-at matrix by creating a 3x3 rotation matrix as you have done here and then expanding it to a 4x4 with zeros and the single 1 in the bottom right corner. Then I build a 4x4 translation matrix using the negative eye point coordinates (no dot products), and multiply the two matrices together. My guess is that this multiplication yields the equivalent of the dot products in the bottom row of your example, but I would need to work it out on paper to make sure.

The 3D rotation transforms your axes. Therefore, you cannot use the eye point directly without also transforming it into this new coordinate system. That's what the matrix multiplications -- or in this case, the 3 dot-product values -- accomplish.

edited Dec 09 '08 at 17:31

answered Dec 09 '08 at 14:44

Judge Maygarden

26,961
9
82
99

2

Shouldn't you be creating a view matrix by calculating the inverse world matrix of the camera-orientation? – xcrypt Jan 25 '12 at 01:14
@xcrypt Do you mean the inverse Transformation matrix of the camera ? – Jun 18 '16 at 17:00
2

Correct me if I'm wrong here, but your description appears to be that of the *viewing* transform (i.e. *view* matrix) whereas the OP seems to be showing the *look-at* matrix. At one point I thought the *view* and *look-at* matrices where the same thing, but got a (costly) burn, and I now think of them as 2 different matrices. Is this wrong? Is a *look-at* matrix **exactly the same** as a *view matrix* but just built differently? – code_dredd Feb 08 '17 at 08:16
@code_dredd The LookAt() function sometimes gives the "world matrix", which is the inverse of the "view matrix". But since inversion is very costly, it's better if the LookAt already returns the view matrix. – Crouching Kitten Jun 19 '20 at 14:49

Bob Cross · Answer 3 · 2008-12-09T21:32:42.917

That translation component helps you by creating an orthonormal basis with your "eye" at the origin and everything else expressed in terms of that origin (your "eye") and the three axes.

The concept isn't so much that the matrix is adjusting the camera position. Rather, it is trying to simplify the math: when you want to render a picture of everything that you can see from your "eye" position, it's easiest to pretend that your eye is the center of the universe.

So, the short answer is that this makes the math much easier.

Answering the question in the comment: the reason you don't just subtract the "eye" position from everything has to do with the order of the operations. Think of it this way: once you are in the new frame of reference (i.e., the head position represented by xaxis, yaxis and zaxis) you now want to express distances in terms of this new (rotated) frame of reference. That is why you use the dot product of the new axes with the eye position: that represents the same distance that things need to move but it uses the new coordinate system.

So by my understanding the matrix is being set up with the correct translation, okay, but why the dot products in that computation? Couldn't it just have been -eye.x, -eye.y, -eye.z ? — Dominik Grabiec, Dec 09 '08 at 06:25

score 4 · Answer 4 · answered Dec 09 '08 at 17:44

Just some general information:

The lookat matrix is a matrix that positions / rotates something to point to (look at) a point in space, from another point in space.

The method takes a desired "center" of the cameras view, an "up" vector, which represents the direction "up" for the camera (up is almost always (0,1,0), but it doesn't have to be), and an "eye" vector which is the location of the camera.

This is used mainly for the camera but can also be used for other techniques like shadows, spotlights, etc.

Frankly I'm not entirely sure why the translation component is being set as it is in this method. In gluLookAt (from OpenGL), the translation component is set to 0,0,0 since the camera is viewed as being at 0,0,0 always.

score 2 · Answer 5 · answered Dec 08 '08 at 10:12

2

Dot product simply projects a point to an axis to get the x-, y-, or z-component of the eye. You are moving the camera backwards so looking at (0, 0, 0) from (10, 0, 0) and from (100000, 0, 0) would have different effect.

answered Dec 08 '08 at 10:12

Eugene Yokota

94,654
45
215
319

score 2 · Answer 6 · answered Dec 09 '08 at 17:50

The lookat matrix does these two steps:

Translate your model to the origin,
Rotate it according to the orientation set up by the up-vector and the looking
direction.

The dot product means simply that you make a translation first and then rotate. Instead of multiplying two matrices the dot product just multiplies a row with a column.

Adi · Answer 7 · 2009-02-15T14:58:28.410

A transformation 4x4 matrix contains two-three components: 1. rotation matrix 2. translation to add. 3. scale (many engine do not use this directly in the matrix).

The combination of the them would transform a point from space A to Space B, hence this is a transformation matrix M_ab

Now, the location of the camera is in space A and so it is not the valid transformation for space B, so you need to multiply this location with the rotation transform.

The only open question remains is why the dots? Well, if you write the 3 dots on a paper, you'd discover that 3 dots with X, Y and Z is exactly like multiplication with a rotation matrix.

An example for that forth row/column would be taking the zero point - (0,0,0) in world space. It is not the zero point in camera space, and so you need to know what is the representation in camera space, since rotation and scale leave it at zero!

cheers

score 0 · Answer 8 · answered Apr 24 '14 at 01:10

It is necessary to put the eye point in your axis space, not in the world space. When you dot a vector with a coordinate unit basis vector, one of the x,y,z, it gives you the coordinates of the eye in that space. You transform location by applying the three translations in the last place, in this case the last row. Then moving the eye backwards, with a negative, is equivalent to moving all the rest of the space forwards. Just like moving up in an elevator makes you feel lke the rest of the world is dropping out from underneath you.

Using a left-handed matrix, with translation as the last row instead of the last column, is a religious difference which has absolutely nothing to do with the answer. However, it is a dogma that should be strictly avoided. It is best to chain global-to-local (forward kinematic) transforms left-to-right, in a natural reading order, when drawing tree sketches. Using left-handed matrices forces you to write these right-to-left.

What is "axis space"? I've never heard that term before. Did you mean object space? upright space? camera space? — Bjorn, Nov 14 '16 at 04:54

Calculating a LookAt matrix

8 Answers8

Additional notes:

Linked