0

Is it possible to get principal point (cx, cy) from a 4x4 projection matrix? This is the same matrix asked in this question: Getting focal length and focal point from a projection matrix

  (SCNMatrix4) 
   s = (m11 = 1.83226573, 
   m12 = 0, 
   m13 = 0, 
   m14 = 0,
   m21 = 0,
   m22 = 2.44078445,
   m23 = 0,
   m24 = 0,
   m31 = -0.00576340035, 
   m32 = -0.0016724075, 
   m33 = -1.00019991, 
   m34 = -1, 
   m41 = 0, 
   m42 = 0, 
   m43 = -0.20002, 
   m44 = 0)

The values I'm trying to calculate in this 3x3 camera matrix is x0 and y0.

RJK
  • 226
  • 1
  • 6
  • 22
  • so you got 3x3 or 4x4 matrix? Also it depends on what kind of projection matrix you got (there are more of them out there). the QA you linked is the `gluPerspective` from GLU and you can extract all the info from it directly using algebra. Also is the matrix just a projection or its also mixed with other transforms? There are also non algebraic approaches how to obtain the parameters from arbitrary matrix just find 2 different lines such their endpoints are projected to the same position using fitting or search and then just compute their intersection that will give you focal point ... – Spektre Mar 20 '19 at 09:20
  • if you have projection with offset then your projection was multiplied most likely by translation matrix at some point from left or right so the `cx,cy,cz` would most likely be in the matrix `origin(x0,y0,z0)` part [Understanding 4x4 homogenous transform matrices](https://stackoverflow.com/a/28084380/2521214) or in its the inverse form ...also look at [OpenGL ray OBB intersection](https://stackoverflow.com/a/52905600/2521214) look for functions `world2scr` and `scr2world` as they use projection matrix parameters ... to map between 3D world and 2D screen position back and forward – Spektre Mar 20 '19 at 10:12
  • @Spektre I have a 4x4 projection matrix. It is just a projection and not mixed with transforms. The matrixes I am using is from a three.ar.js wrapper that extends WebVR API (https://immersive-web.github.io/webvr/spec/1.1/#vrframedata-attributes). I believe the focal point is the the value at [0][0] in my matrix. But I am trying to find out the principal point (cx, cy). Thanks for your direction. I'll look into the documentation. – RJK Mar 20 '19 at 15:48
  • and what exactly is the principal point in your definition ? – Spektre Mar 20 '19 at 16:51
  • https://immersive-web.github.io/webvr/spec/1.1/#vreyeparameters-attributes the offset listed here is pretty much what i am trying to get as the principal point: ```vector from the center point of the headset to the center point of the lens for the given eye```, the problem is offset is either deprecated or not supported anymore since I get a vector of (0, 0, 0) when I try to get the offset. I thought it would help to go backwards and derive the principal point from the perspective projection matrix. – RJK Mar 20 '19 at 16:58
  • so you want offset from center for each eye/camera ... do you got both left and right matrices of the same headset situation? the differing elements will be holding most likely your offset .. if its length is approx ~6.5cm/2 then its directly offset if not then its multiplicated with the original matrix and you need to decode ... – Spektre Mar 20 '19 at 17:09
  • I have both left and right projection matrices and left and right view matrices. The differing elements with the projection matrices is the offset? If my headset is not a headset, but is a phone, would the matrices be the same and the principal point is 0,0? – RJK Mar 20 '19 at 17:16
  • it depends on actual implementation that is used but more or less yes, **but both projections must be parallel to each** otherwise the other elements would change too ... – Spektre Mar 20 '19 at 20:41

2 Answers2

0

I recently confronted this problem, and quite astonished I couldn't find a relevant solution on Internet, because it seems to be a simple mathematics problem.

After a few days of struggling with matrices, I found a solution.

Let's define two Cartesian coordinate system, the camera coordinate system with x', y', z' axes, and the world coordinate system with x, y, z axes. The camera(or the eye) is positioned at the origin of the camera coordinate system and the image plane(a plane containing the screen) is z' = -n, where n is the focal length and the focal point is the position of the camera. I am using the convention of OpenGL and n is the nearVal argument of the glFrustum().

You can define a 4x4 transformation matrix M in a homogeneous coordinate system to deal with a projection. The M transforms a coordinate (x, y, z) in the world coordinate system into a coordinate (x', y', z') in the camera coordinate system like the following, where @ means a matrix multiplication.

[
  [x_prime_h],
  [y_prime_h],
  [z_prime_h],
  [w_prime_h],
] = M @ [
  [x_h],
  [y_h],
  [z_h],
  [w_h],
]
[x, y, z] = [x_h, y_h, z_h] / w_h
[x_prime, y_prime, z_prime] = [x_prime_h, y_prime_h, z_prime_h] / w_prime_h

Now assume you are given M = P V, where P is a perspective projection matrix and V is a view transformation matrix. The theoretical projection matrix is like the following.

P_theoretical = [
  [n, 0, 0, 0],
  [0, n, 0, 0],
  [0, 0, n, 0],
  [0, 0, -1, 0],
]

In OpenGL, an augmented matrix like the following is used to cover the normalization and nonlinear scaling on z coordinates, where l, r, b, t, n, f are the left, right, bottom, top, nearVal, farVal arguments of the glFrustum().(The resulting z' coordinate is not actually the coordinate of a projected point, but a value used for Z-buffering.)

P = [
  [2*n/(r-l), 0, (r+l)/(r-l), 0],
  [0, 2*n/(t-b), (t+b)/(t-b), 0],
  [0, 0, -(f+n)/(f-n), -2*n*f/(f-n)],
  [0, 0, -1, 0],
]

The transformation V is like the following, where r_ij is the element at i-th row and j-th column of the 3x3 rotational matrix R and (c_0, c_1, c_2) is the coordinate of the camera.

V = [
  [r_00, r_01, r_02, -(r_00*c_0 + r_01*c_1 + r_02*c_2)],
  [r_10, r_11, r_12, -(r_10*c_0 + r_11*c_1 + r_12*c_2)],
  [r_20, r_21, r_22, -(r_20*c_0 + r_21*c_1 + r_22*c_2)],
  [0, 0, 0, 1],
]

The P and V can be represented with block matrices like the following.

C = [
 [c_0],
 [c_1],
 [c_2],
]
A = [
  [2*n/(r-l), 0, (r+l)/(r-l)],
  [0, 2*n/(t-b), (t+b)/(t-b)],
  [0, 0, -(f+n)/(f-n)],
]
B = [
  [0],
  [0],
  [-2*n*f/(f-n)],
]
P = [
  [A,B],
  [[0, 0, -1], [0]],
]
V = [
  [R, -R @ C],
  [[0, 0, 0], [1]],
]
M = P @ V = [
  [A @ R, -A @ R @ C + B],
  [[0, 0, -1] @ R, [0, 0, 1] @ R @ C],
]

Let m_ij be the element of M at i-th row and j-th column. Taking the first element of the second row of the above block notation of M, you can solve for the elementary z' vector of the camera coordinate system, the opposite direction from the camera point to the intersection point between the image plane and its normal line passing through the focal point.(The intersection point is the principal point.)

e_z_prime = [0, 0, 1] @ R = -[m_30, m_31, m_32]

Taking the second column of the above block notation of M, you can solve for C like the following, where inv(X) is an inverse of a matrix X.

C = - inv([
  [m_00, m_01, m_02],
  [m_10, m_11, m_12],
  [m_30, m_31, m_32],
]) @ [
  [m_03],
  [m_13],
  [m_33],
]

Let p_ij be the element of P at i-th row and j-th column. Now you can solve for p_23 = -2nf/(f-n) like the following.

B = [
  [m_03],
  [m_13],
  [m_23],
] + [
  [m_00, m_01, m_02],
  [m_10, m_11, m_12],
  [m_20, m_21, m_22],
] @ C
p_23 = B[2] = m_23 + (m_20*c_0 + m_21*c_1 + m_22*c_2)

Now using the fact p_20 = p_21 = 0, you can get p_22 = -(f+n)/(f-n) like the following.

p_22 * e_z_prime = [m_20, m_21, m_22]
p_22 = -(m_20*m_30 + m_21*m_31 + m_22*m_32)

Now you can get n and f from p_22 and p_23 like the following.

n = p_23/(p_22-1)
  = -(m_23 + m_20*c_0+m_21*c_1+m_22*c_2) / (m_20*m_30+m_21*m_31+m_22*m_32 + 1)
f = p_23/(p_22+1)
  = -(m_23 + m_20*c_0+m_21*c_1+m_22*c_2) / (m_20*m_30+m_21*m_31+m_22*m_32 - 1)

From the camera position C, the focal length n and the elementary z' vector e_z_prime, you can get the principal point, C - n * e_z_prime.

As a side note, you can prove the input matrix of inv() in the formula for getting C is nonsingular. And you can also find elementary x' and y' vectors of the camera coordinate system, and find the l, r, b, t using these vectors.(There will be two valid solutions for the (e_x_prime, e_y_prime, l, r, b, t) tuple, due to the symmetry.) And finally this solution can be expanded when the transformation matrix is mixed with the world transformation which does an anisotropic scaling, that is when M = P V W and W can have unequal eigenvalues.

relent95
  • 3,703
  • 1
  • 14
  • 17
0

From your data m34 = -1 and m43 = -0.20002 (neither 1 nor -1), so your projection matrix s is column major, and I guess you are using OpenGL default setting, which should be right-hand system and look into axis-z negative. In this case, the K should be:

[-fx, s, u0; 
  0, fy, v0; 
  0, 0, 1]

Usually I ignore s (s=0). Then K can be converted to projection matrix:

l = - u_0 * near / f_x     
r = (W - u_0) * near / f_x      
b = - (H - v_0) * near / f_y   
t = v_0 * near / f_y  

proj = [2 * near/(r - l),  0 , (r+l)/(r-l), 0;   
    0 , 2 * near /(t - b) , (t+b)/(t-b) , 0;     
    0 , 0, (far + near)/(near - far),2 * near * far/(near - far);  
    0 , 0 , -1 , 0]

Since your matrix is column major, so you have:

(far + near)/(near - far) = m33; 
2*near*far / (near - far) = m43;

calculate these equations you should be able to know near and far. Then

2*near/(t-b) = m22, 
(t+b)/(t-b) = m32

you can get t and b; and get r and l from:

2*near/(r-l) = m11, 
(r+l)/(r-l) = m31

If you know the window size W and H, then you can eventually calculate fx, fy,u0 and v0;
Note that the first element of K should be -fx; Actually we may have 4 cases: right-hand, z negative; right-hand, z positive; left-hand, z negative; left-hand z positive. You can use m34 and m22 to check which one is your setting. I summarized how to convert K to projection matrix here: https://github.com/bitlw/LearnProjMatrix . You can take a look if you need more details.

Wei Liu
  • 1
  • 1