1

I am working with the cameraCalibration function of openCV and that works just fine. However, I'm having trouble understanding why it uses it's particular cost function

sqrt( 1/n *  sum( d(xi', xi)**2 ,i , 1, n))

where xi' are the re-projected (or model) coordinates and xi the raw image coordinates (see for instance this Question). Intuitively, I would write down the cost function as

1/n sum( d(xi', xi) , i, 1, n)

In other words, as the mean of the euclidean distances of the points.

I understand that these expressions are different quantiatively. What I'm interesetested in is what is the qualitative difference between the prefered solutions of the two cost functions and why is the former used in camera calibration?

obachtos
  • 977
  • 1
  • 12
  • 30

2 Answers2

1

Interpreting the sets of n reprojections and data as vectors in a 2n-dimensional space, the squared reprojection error (your first formula) is 1/n times the square of the length of the difference between the vectors. Because the square of a vector's length grows monotonically with the length itself, optimizing the square is the same as optimizing the length. So, the error you are minimizing is really the length of a vector (a.k.a. L2-norm) in a high dimensional space.

Because the length of a vector is invariant with respect to a change in coordinates, the optimum you find is invariant as well. This is not true for other cost functions, which may lead to biased results depending on the particular specification of the problem.

See the Gauss-Markov Theorem for a more in-depth discussion of why we use Least Squares estimators for camera calibration (and many other problems).

Francesco Callari
  • 11,300
  • 2
  • 25
  • 40
  • 1
    The thing I don't see is why the found optimum is invariant. In particular, I don't get what you mean by "chang of coordinates"? – obachtos Dec 01 '20 at 16:23
1

The first quantity would be the RMS of re-projection error vector lengths. For the sake of optimisation, the sum of squares, SSE, has the same extrema:

sum( d(xi', xi)**2 ) = sum( dot(xi' - xi, xi' - xi) )

Your alternative is the sum (or equivalently the mean) of error lengths:

sum( d(xi', xi) ) = sum( sqrt( dot(xi' - xi, xi' - xi) ) )

A third alternative would be the sum of absolute deviations:

sum( abs(xi'_x - xi_x) + abs(xi'_y - xi_y) )

So the question boils down to: why do we prefer the least-squares solution?

The main reason is that, if errors are zero-mean, independent and normally distributed (sufficient but not necessary), the least-squares solution is the max-likelihood estimate - aka the solution (out of all possible solutions) that makes the observed data most probable.

A second reason is that the least-squares formulation allows for nice and relatively simple mathematical derivations (see LM algorithm). With the sum of lengths, you would have to take derivatives of an expression involving square roots. With the least absolute deviations (LAD), you would need derivatives of the absolute function and local update steps may not be unique.

As a side note: the assumption of normally distributed errors is fair but if you analyse the residual distribution after camera calibration, you might find it violated if the calibration was not done with great care. For other types of error distribution, the LAD is in fact the max-likelihood estimator.

There is many more details on the calibration process in this article: https://calib.io/blogs/knowledge-base/camera-calibration

Jakob
  • 151
  • 1
  • 3