Find the Transformation Matrix that maps 3D local coordinates to global coordinates

Question

I'm coding a calibration algorithm for my depth-camera. This camera outputs an one channel 2D image with the distance of every object in the image.

From that image, and using the camera and distortion matrices, I was able to create a 3D point cloud, from the camera perspective. Now I wish to convert those 3D coordinates to a global/world coordinates. But, since I can't use any patterns like the chessboard to calibrate the camera, I need another alternative.

So I was thinking: If I provide some ground points (in the camera perspective), I would define a plane that I know should have the Z coordinate close to zero, in the global perspective. So, how should I proceed to find the transformation matrix that horizontalizes the plane.

Local coordinates ground plane, with an object on top

I tried using the OpenCV's solvePnP, but it didn't gave me the correct transformation. Also I thought in using the OpenCV's estimateAffine3D, but I don't know where should the global coordinates be mapped to, since the provided ground points do not need to lay on any specific pattern/shape.

Thanks in advance

Milo · Accepted Answer · 2019-09-18T20:40:30.913

What you need is what's commonly called extrinsic calibration: a rigid transformation relating the 3D camera reference frame to the 'world' reference frame. Usually, this is done by finding known 3D points in the world reference frame and their corresponding 2D projections in the image. This is what SolvePNP does.

To find the best rotation/translation between two sets of 3D points, in the sense of minimizing the root mean square error, the solution is:

Theory: https://igl.ethz.ch/projects/ARAP/svd_rot.pdf
Easier explanation: http://nghiaho.com/?page_id=671
Python code (from the easier explanation site): http://nghiaho.com/uploads/code/rigid_transform_3D.py_

So, if you want to transform 3D points from the camera reference frame, do the following:

As you proposed, define some 3D points with known position in the world reference frame, for example (but not necessarily) with Z=0. Put the coordinates in a Nx3 matrix P.
Get the corresponding 3D points in the camera reference frame. Put them in a Nx3 matrix Q.
From the file defined in point 3 above, call rigid_transform_3D(P, Q). This will return a 3x3 matrix R and a 3x1 vector t.

Then, for any 3D point in the world reference frame p, as a 3x1 vector, you can obtain the corresponding camera point, q with:

q = R.dot(p)+t

EDIT: answer when 3D position of points in world are unspecified

Indeed, for this procedure to work, you need to know (or better, to specify) the 3D coordinates of the points in your world reference frame. As stated in your comment, you only know the points are in a plane but don't have their coordinates in that plane.

Here is a possible solution:

Take the selected 3D points in camera reference frame, let's call them q'_i.
Fit a plane to these points, for example as described in https://www.ilikebigbits.com/2015_03_04_plane_from_points.html. The result of this will be a normal vector n. To fully specify the plane, you need also to choose a point, for example the centroid (average) of q'_i.
As the points surely don't perfectly lie in the plane, project them onto the plane, for example as described in: How to project a point onto a plane in 3D?. Let's call these projected points q_i.
At this point you have a set of 3D points, q_i, that lie on a perfect plane, which should correspond closely to the ground plane (z=0 in world coordinate frame). The coordinates are in the camera reference frame, though.
Now we need to specify an origin and the direction of the x and y axes in this ground plane. You don't seem to have any criteria for this, so an option is to arbitrarily set the origin just "below" the camera center, and align the X axis with the camera optical axis. For this:
Project the point (0,0,0) into the plane, as you did in step 4. Call this o. Project the point (0,0,1) into the plane and call it a. Compute the vector a-o, normalize it and call it i.
o is the origin of the world reference frame, and i is the X axis of the world reference frame, in camera coordinates. Call j=nxi ( cross product). j is the Y-axis and we are almost finished.
Now, obtain the X-Y coordinates of the points q_i in the world reference frame, by projecting them on i and j. That is, do the dot product between each q_i and i to get the X values and the dot product between each q_i and j to get the Y values. The Z values are all 0. Call these X, Y, 0 coordinates p_i.
Use these values of p_i and q_i to estimate R and t, as in the first part of the answer!

Maybe there is a simpler solution. Also, I haven't tested this, but I think it should work. Hope this helps.

Thank you very much for the reply. Today I'm not able to test this, but I'll try to test it tomorrow. I only got one obstacle in using your solution: In the first step you suggest to define some known 3D points. What if those points aren't known? For example, an user selects some ground points from the 2D depth image. Then they are converted to 3D camera coordinates. But how will I know the corresponding 3D world coordinates? Is there a way to say "I got some 3D camera coordinates that belong to the ground plane, find me the rotation matrix that projects those points to an horizontal plane" — Diogo, Sep 12 '19 at 16:52
I already managed to find the transformation matrices, but always assuming a known configuration of points (fixed sized rectangle, square, ...). If I'm in a situation where the user clicks the screen to choose the points, I won't be abble to find their correct world coordinates, since the points are randomly chosen (but always belonging to the world coordinates floor). — Diogo, Sep 16 '19 at 16:46
Thank you very much for this solution, again. I haven't tried it yet, but I will as soon as I can, and hopefully close this question. — Diogo, Sep 19 '19 at 07:49
Sorry for bothering you again. In your step 5, if I wish to align the X axis with a specific line( defined by two points), beeing one of those points the center of the referential, can I simply switch the points (0,0,0) and (0,0,1) with the ones that define the x line? — Diogo, Sep 27 '19 at 12:34
@Diogo: You just need an origin **o** and an unitary vector **i** that lie in the plane. Two other points should work. Just remember to normalize **i**. — Milo, Sep 27 '19 at 13:24
Again, thank you very much for your help. I already have my algorithm built, but now If I try to use the Rotation/Translation Matrices to convert the cam points to the world points system, I get this:https://ibb.co/19WZm5T (red->camCoordinates; blue ->worldCoordinates). Any guess what can be wrong in this situation? Again Thank you very much for your help — Diogo, Oct 10 '19 at 10:00

Find the Transformation Matrix that maps 3D local coordinates to global coordinates

1 Answers1