OpenCV>> Structure from motion, triangulation

Question

Use-case

Generate a synthetic 3D scene using random points
Generate two synthetic cameras
Get the 2 camera 2D projections
Derive the Fundamental & Essential matrices
Derive rotation & translation using the Essential matrix
Triangulate the 2 2D projections to result the initial 3D scene

Implementation

Random 3D points ( x, y, z ) are generated
Camera intrinsic matrix is statically defined
Rotation matrix is statically defined ( 25 deg rotation on the Z-axis )
Identity Translation Matrix ( no translation )
Two projections are synthetically generated ( K*R*T )
Fundamental matrix is resolved using cv::findFundamentalMat ( F )
Essential matrix E is computed using 'K.t() * F * K'
Camera extrinsic is extracted using SVD resulting in 4 possible solutions ( in accordance to 'Hartley & Zisserman Multiple View Geometry chapter 9.2.6 )
Triangulation is done using cv::triangulatePoints in the following manner: cv::triangulatePoints(K * matRotIdentity, K * R * T, v1, v2, points);
'points' is a 4-rows N-columns matrix with homogeneous coordinates ( x, y, z, w )
'points' is converted to un-homogeneous ( local ) coordinates by dividing 'x, y, z' with 'w'

The result

The resulting 3D points match to the original points up to a scale ( ~144 in my case ).

Questions

Camera Translation is derived up to a scale ( at #8 ), having that in mind, would it be right to assume that the triangulation result is also up to a scale?
Is it possible to derive scale w/o having any prior knowledge of the camera position or absolute size of the points ?

Any help would be appreciated.

EDIT:

I was trying to use the exact same projection matrices used for the 3D -> 2D projection to convert back from 2D to 3D ( using cv::tirangulatePoints ), surprisingly, this has resulted a null vector ( all 3D points have x,y,z,w == 0 ), this has ended up to be because the two cameras differed only by rotation and not by transation, and thus, the two projection lines were orthogonal ( in 3D Space ) resulting a zero length baseline ( an epipolar line rather than a plane ), and thus, minimizing the distance at x,y,z == 0, resulting the null vector.

Adding a translation between the two cameras resulted in properly recovering the original coordinates, this however, given that I was using the exact same projection matrices for the 3D to 2D transfer, AND, then, 3D back to 2D triangulation.

When doing camera pose estimation ( extracting the projection matrix from points correspondence ) the translation is derived up to a scale, and thus, the triangulation result is also up to a scale.

Question

Is it possible to derive the difference in translation ( how much the camera has moved ) in metric/pixel, ... units rather than up to a scale? what prior knowledge is needed?

Do not tag for code, if you're not asking for it ... – πάντα ῥεῖ Apr 23 '14 at 19:51 — πάντα ῥεῖ, Apr 23 '14 at 19:51

score 4 · Answer 1 · answered Apr 23 '14 at 20:15

4

Triangulated points are in the same coordinate system as the cameras which are used for triangulation...
In a practical sense, No.

answered Apr 23 '14 at 20:15

David Nilosek

1,422
9
13

In response to [1], I assume from what you say, that the points resulting from the triangulation should have the same coordinates as those synthesized @ #1, I get points that are scaled up by ~144, what might cause that ? – NadavRub Apr 23 '14 at 20:44
No, I mean they should be in the same system as your estimated cameras, hence the scale change. – David Nilosek Apr 23 '14 at 21:00

score 0 · Answer 2 · answered Aug 06 '15 at 15:30

Think of it like you have images showing a room. You don't know if the room is at normal size for human being or it is in a matchbox.

To know the real size of the objects you have to know one of these measures:

Size of one object in the scene
Distance between the cameras (in your wanted coordinate system)

For the latter point it is possible with GPS coordinates to take the least squares result if you have many images and the correct GPS coordinates. Therefore, you have to get the GPS coordinates (WGS84) in a metric system, e.g. UTM.

OpenCV>> Structure from motion, triangulation

Questions

EDIT:

Question

2 Answers2

Linked