I recently confronted this problem, and quite astonished I couldn't find a relevant solution on Internet, because it seems to be a simple mathematics problem.
After a few days of struggling with matrices, I found a solution.
Let's define two Cartesian coordinate system, the camera coordinate system with x', y', z' axes, and the world coordinate system with x, y, z axes. The camera(or the eye) is positioned at the origin of the camera coordinate system and the image plane(a plane containing the screen) is z' = -n, where n is the focal length and the focal point is the position of the camera. I am using the convention of OpenGL and n is the nearVal
argument of the glFrustum()
.
You can define a 4x4 transformation matrix M in a homogeneous coordinate system to deal with a projection. The M transforms a coordinate (x, y, z) in the world coordinate system into a coordinate (x', y', z') in the camera coordinate system like the following, where @ means a matrix multiplication.
[
[x_prime_h],
[y_prime_h],
[z_prime_h],
[w_prime_h],
] = M @ [
[x_h],
[y_h],
[z_h],
[w_h],
]
[x, y, z] = [x_h, y_h, z_h] / w_h
[x_prime, y_prime, z_prime] = [x_prime_h, y_prime_h, z_prime_h] / w_prime_h
Now assume you are given M = P V, where P is a perspective projection matrix and V is a view transformation matrix. The theoretical projection matrix is like the following.
P_theoretical = [
[n, 0, 0, 0],
[0, n, 0, 0],
[0, 0, n, 0],
[0, 0, -1, 0],
]
In OpenGL, an augmented matrix like the following is used to cover the normalization and nonlinear scaling on z coordinates, where l, r, b, t, n, f are the left
, right
, bottom
, top
, nearVal
, farVal
arguments of the glFrustum()
.(The resulting z' coordinate is not actually the coordinate of a projected point, but a value used for Z-buffering.)
P = [
[2*n/(r-l), 0, (r+l)/(r-l), 0],
[0, 2*n/(t-b), (t+b)/(t-b), 0],
[0, 0, -(f+n)/(f-n), -2*n*f/(f-n)],
[0, 0, -1, 0],
]
The transformation V is like the following, where r_ij is the element at i-th row and j-th column of the 3x3 rotational matrix R and (c_0, c_1, c_2) is the coordinate of the camera.
V = [
[r_00, r_01, r_02, -(r_00*c_0 + r_01*c_1 + r_02*c_2)],
[r_10, r_11, r_12, -(r_10*c_0 + r_11*c_1 + r_12*c_2)],
[r_20, r_21, r_22, -(r_20*c_0 + r_21*c_1 + r_22*c_2)],
[0, 0, 0, 1],
]
The P and V can be represented with block matrices like the following.
C = [
[c_0],
[c_1],
[c_2],
]
A = [
[2*n/(r-l), 0, (r+l)/(r-l)],
[0, 2*n/(t-b), (t+b)/(t-b)],
[0, 0, -(f+n)/(f-n)],
]
B = [
[0],
[0],
[-2*n*f/(f-n)],
]
P = [
[A,B],
[[0, 0, -1], [0]],
]
V = [
[R, -R @ C],
[[0, 0, 0], [1]],
]
M = P @ V = [
[A @ R, -A @ R @ C + B],
[[0, 0, -1] @ R, [0, 0, 1] @ R @ C],
]
Let m_ij be the element of M at i-th row and j-th column. Taking the first element of the second row of the above block notation of M, you can solve for the elementary z' vector of the camera coordinate system, the opposite direction from the camera point to the intersection point between the image plane and its normal line passing through the focal point.(The intersection point is the principal point.)
e_z_prime = [0, 0, 1] @ R = -[m_30, m_31, m_32]
Taking the second column of the above block notation of M, you can solve for C like the following, where inv(X) is an inverse of a matrix X.
C = - inv([
[m_00, m_01, m_02],
[m_10, m_11, m_12],
[m_30, m_31, m_32],
]) @ [
[m_03],
[m_13],
[m_33],
]
Let p_ij be the element of P at i-th row and j-th column.
Now you can solve for p_23 = -2nf/(f-n) like the following.
B = [
[m_03],
[m_13],
[m_23],
] + [
[m_00, m_01, m_02],
[m_10, m_11, m_12],
[m_20, m_21, m_22],
] @ C
p_23 = B[2] = m_23 + (m_20*c_0 + m_21*c_1 + m_22*c_2)
Now using the fact p_20 = p_21 = 0, you can get p_22 = -(f+n)/(f-n) like the following.
p_22 * e_z_prime = [m_20, m_21, m_22]
p_22 = -(m_20*m_30 + m_21*m_31 + m_22*m_32)
Now you can get n and f from p_22 and p_23 like the following.
n = p_23/(p_22-1)
= -(m_23 + m_20*c_0+m_21*c_1+m_22*c_2) / (m_20*m_30+m_21*m_31+m_22*m_32 + 1)
f = p_23/(p_22+1)
= -(m_23 + m_20*c_0+m_21*c_1+m_22*c_2) / (m_20*m_30+m_21*m_31+m_22*m_32 - 1)
From the camera position C, the focal length n and the elementary z' vector e_z_prime, you can get the principal point, C - n * e_z_prime.
As a side note, you can prove the input matrix of inv() in the formula for getting C is nonsingular. And you can also find elementary x' and y' vectors of the camera coordinate system, and find the l, r, b, t using these vectors.(There will be two valid solutions for the (e_x_prime, e_y_prime, l, r, b, t) tuple, due to the symmetry.) And finally this solution can be expanded when the transformation matrix is mixed with the world transformation which does an anisotropic scaling, that is when M = P V W and W can have unequal eigenvalues.