How to find Camera matrix for Augmented Reality?

Question

I want to augment a virtual object at x,y,z meters wrt camera. OpenCV has camera calibration functions but I don't understand how exactly I can give coordinates in meters

I tried simulating a Camera in Unity but don't get expected result.

I set the projection matrix as follows and create a unit cube at z = 2.415 + 0.5 . Where 2.415 is the distance between the eye and the projection plane (Pinhole camera model) Since the cube's face is at the front clipping plane and it's dimension are unit shouldn't it cover the whole viewport?

    Matrix4x4 m = new Matrix4x4();
    m[0, 0] = 1;
    m[0, 1] = 0;
    m[0, 2] = 0;
    m[0, 3] = 0;

    m[1, 0] = 0;
    m[1, 1] = 1;
    m[1, 2] = 0;
    m[1, 3] = 0;

m[2, 0] = 0;
    m[2, 1] = 0;
    m[2, 2] = -0.01f;
    m[2, 3] = 0;

    m[3, 0] = 0;
    m[3, 1] = 0;
    m[3, 2] = -2.415f;
    m[3, 3] = 0;

If you set m[3,2] = -1 / 2.415f and m[3,3] = 1 projection matrix works correctly — Sumeet Jindal, Jun 12 '12 at 05:32

Francesco Callari · Answer 1 · 2012-06-12T16:31:20.137

The global scale of your calibration (i.e. the units of measure of 3D space coordinates) is determined by the geometry of the calibration object you use. For example, when you calibrate in OpenCV using images of a flat checkerboard, the inputs to the calibration procedure are corresponding pairs (P, p) of 3D points P and their images p, the (X, Y, Z) coordinates of the 3D points are expressed in mm, cm, inches, miles, whatever, as required by the size of target you use (and the optics that images it), and the 2D coordinates of the images are in pixels. The output of the calibration routine is the set of parameters (the components of the projection matrix P and the non-linear distortion parameters k) that "convert" 3D coordinates expressed in those metrical units into pixels.

If you don't know (or don't want to use) the actual dimensions of the calibration target, you can just fudge them but leave their ratios unchanged (so that, for example, a square remains a square even though the true length of its side may be unknown). In this case your calibration will be determined up to an unknown global scale. This is actually the common case: in most virtual reality applications you don't really care what the global scale is, as long as the results look correct in the image.

For example, if you want to add an even puffier pair of 3D lips on a video of Angelina Jolie, and composite them with the original video so that the brand new fake lips stay attached and look "natural" on her face, you just need to rescale the 3D model of the fake lips so that it overlaps correctly the image of the lips. Whether the model is 1 yard or one mile away from the CG camera in which you render the composite is completely irrelevant.

I am able to augment models without proper scale but I want it to be exact in meters. I will be doing stereo rendering and it must give exact depth perception. — Sumeet Jindal, Jun 12 '12 at 02:44
Metrical accuracy is irrelevant in the stereo rendering case too: all you care in this case is to render with the correct amount of left-right disparity in the two cameras. As long all size and distance ratios are correct, actual global scale cannot be perceived. — Francesco Callari, Jun 12 '12 at 16:43
Recommend you think long and hard whether you really need global scale: it's amazingly hard to do it right, especially when significant accuracy is required. Think about manufacturing a calibration rig with size/shape adequate for your optics and depth range, that can stay rigid and stable long enough, and does not cost a fortune. Then file for a patent every time you make one that works. — Francesco Callari, Jun 12 '12 at 16:46

score 5 · Answer 2 · edited May 23 '17 at 12:01

For finding augmenting an object you need to find camera pose and orientation. That is the same as finding the camera extrinsics. You also have to calculate first the camera intrinsics (which is called calibraiton).

OpenCV allows you to do all of this, but is not trivial, it requires work on your own. I give you a clue, you first need to recognize something in the scene that you know how it looks, so you can calculate the camera pose by analyzing this object, call it a marker. You can start by the tipical fiducials, they are easy to detect.

Have a look at this thread.

score 4 · Accepted Answer · answered Jul 14 '12 at 11:35

I ended up measuring the field of view manually. Once you know FOV you can easily create the projection matrix. No need to worry about units because in the end the projection is of the form ( X*d/Z, Y*d/Z ). Whatever the units of X,Y,Z may be the ratio of X/Z remains the same.

How to find Camera matrix for Augmented Reality?

3 Answers3

Linked