3

I want to do some Structure-from-Motion using OpenCV. So far I have the fundamentalmatix and the essentialmatrix. Having the essentialmatrix I am doing SVD for getting R and T.

My problem is that I have 2 possible solutions for R and 2 possible solutions for T which leads to 4 solutions for the overall pose, where only one of the 4 solutions is the right one. How can I find the correct solution?

Here is my code:

private void calculateRT(Mat E, Mat R, Mat T){

    Mat w = new Mat();
    Mat u = new Mat();
    Mat vt = new Mat();

    Mat diag = new Mat(3,3,CvType.CV_64FC1);
    double[] diagVal = {1,0,0,0,1,0,0,0,1};
    diag.put(0, 0, diagVal);

    Mat newE = new Mat(3,3,CvType.CV_64FC1);

    Core.SVDecomp(E, w, u, vt, Core.DECOMP_SVD); 

    Core.gemm(u, diag, 1, vt, 1, newE);

    Core.SVDecomp(newE, w, u, vt, Core.DECOMP_SVD);

    publishProgress("U: " + u.dump());
    publishProgress("W: " + w.dump());
    publishProgress("vt:" + vt.dump());

    double[] W_Values = {0,-1,0,1,0,0,0,0,1};
    Mat W = new Mat(new Size(3,3), CvType.CV_64FC1);
    W.put(0, 0, W_Values);

    double[] Wt_values = {0,1,0-1,0,0,0,0,1};
    Mat Wt = new Mat(new Size(3,3), CvType.CV_64FC1);
    Wt.put(0,0,Wt_values);


    Mat R1 = new Mat();
    Mat R2 = new Mat();

    // u * W * vt = R 
    Core.gemm(u, Wt, 1, vt, 1, R2);
    Core.gemm(u, W, 1, vt, 1, R1);

    publishProgress("R: " + R.dump());


    // +- T (2 possible solutions for T)
    Mat T1 = new Mat();
    Mat T2 = new Mat();
    // T = u.t
    u.col(2).copyTo(T1);

    publishProgress("T : " + T.dump());

    Core.multiply(T, new Scalar(-1.0, -1.0, -1.0), T2);

    // TODO Here I have to find the correct combination for R1 R2 and T1 T2

}
glethien
  • 2,440
  • 2
  • 26
  • 36

1 Answers1

5

There is a theoretical ambiguity when reconstructing the relative euclidian poses of two cameras from their fundamental matrix. This ambiguity is linked to the fact that, given a 2D point in an image, the classic pinhole camera model cannot tell whether the corresponding 3D point is in front of the camera or behind the camera. In order to remove this ambiguity, you need to know one point correspondence in the images: as these two 2D points are assumed to be the projections of a single 3D point lying in front of both cameras (since it is visible in both images), this will enable choosing the right R and T.

For that purpose, one method is explained in ยง 6.1.4 (p47) of the following PhD thesis: "Geometry, constraints and computation of the trifocal tensor", by C.Ressl (PDF). The following gives the outline of this method. I'll denote the two corresponding 2D points by x1 and x2, the two camera matrices by K1 and K2 and the essential matrix by E12.

i. Compute the SVD of the essential matrix E12 = U * S * V'. If det(U) < 0 set U = -U. If det(V) < 0 set V = -V.

ii. Define W = [0,-1,0; 1,0,0; 0,0,1], R2 = U * W * V' and T2 = third column of U

iii. Define M = [ R2'*T2 ]x, X1 = M * inv(K1) * x1 and X2 = M * R2' * inv(K2) * x2

iv. If X1(3) * X2(3) < 0, set R2 = U * W' * V' and recompute M and X1

v. If X1(3) < 0 set T2 = -T2

vi. Define P1_E = K1 * [ I | 0 ] and P2_E = K2 * [ R2 | T2 ]

The notation ' denotes the transpose and the notation [.]x used in step iii. corresponds to the skew-symetric operator. Applying the skew-symmetric operator on a 3x1 vector e = [e_1; e_2; e_3] results in the following (see the Wikipedia article on cross-product):

[e]x = [0,-e_3,e_2; e_3,0,-e_1; -e_2,e_1,0]

Finally, note that the norm of T2 will always be 1, since it is one of the column of an orthogonal matrix. This means that you won't be able to recover the true distance between the two cameras. For that purpose, you need to know the true distance between two points in the scene and take that into account to calculate the true distance between the cameras.

BConic
  • 8,750
  • 2
  • 29
  • 55
  • I do not really understand iii. I have two points x1 and x2 (in image coordiantes means x1 = (x,y) as pixel koordinates) and the camera intriniscs, but I do not understand how I can calculate M, X1, X2 โ€“ glethien Apr 02 '14 at 10:46
  • @glethien for M, you first compute `R2` transposed times `T2` and then you apply the skew-symmetric operator `[.]x` on the result (see the Wikipedia link for a definition). Hence `M` is a 3x3 matrix. For `X1` and `X2`, I used two different notations for matrix multiplication but they're the same (I edited my post to correct that). Also, `x1` and `x2` should be in homogeneous coordinates, i.e. a 3x1 vector such as `x1 = [x; y; 1]`. โ€“ BConic Apr 02 '14 at 11:21
  • @glethien Also, note that if you modified R2 in step iv, then you need to update M and X1 accordingly before doing step v. I made that clearer in my post. โ€“ BConic Apr 02 '14 at 11:47