1

I am trying to do 3D reconstruction using SFM (Structure From Motion). I am pretty new to computer vision and doing this as a hobby, so if you use acronyms please also let me know what it stands for so I can look it up.

Learning wise, I have been following this information :

  1. https://www.youtube.com/watch?v=SyB7Wg1e62A&list=PLgnQpQtFTOGRYjqjdZxTEQPZuFHQa7O7Y&ab_channel=CyrillStachniss
  2. https://imkaywu.github.io/tutorials/sfm/#triangulation
  3. Plus links below from quick question.

My end goal is to use this on persons face, to create a 3D face reconstruction. If people have advice on this topic specifically please let me know as well.

I do the following steps :

  1. IO using OpenCV. A video taken using a single camera.
  2. Find intrinsic parameters and distortion coefficients of the camera using Zhangs method.
  3. Use SIFT to find features from frame 1 and frame 2.
  4. Feature matching is done using cv2.FlannBasedMatcher().
  5. Compute essential matrix using cv2.findEssentialMat().
  6. Projection matrix of frame 1 is set to numpy.hstack((numpy.eye(3), numpy.zeros((3, 1))))
  7. Rotation and Translation are obtained using cv2.recoverPose().
  8. Using Rotation and Translation we get the Projection Matrix of frame 2 curr_proj_matrix = cv2.hconcat([curr_rotation_matrix, curr_translation_matrix]).
  9. I use cv2.undistortPoints() on feature pts for frame 1 and 2, using information from step 2.
  10. Lastly, I do triangulation points_4d = triangulation.triangulate(prev_projection_matrix, curr_proj_matrix, prev_pts_u, curr_pts_u)
  11. Then I reassign prev values to be equal curr values and continue through the video.
  12. I use matplotlib to display the scatter plot.

Quick Question :

  1. Why do some articles do E = (K^-1)T * F * K and some E = (K)T * F * K.

First way : What do I do with the fundamental matrix?

Second way : https://harish-vnkt.github.io/blog/sfm/

Issue :

As you can see the scatter plot looks a bit warped, I am unsure why, or if I am missing a step, or doing something wrong. Hence looking for advise. Also the Z axis, is all negative.

One of the guesses I had, was that the video is in 60 FPS and even though I am moving the camera relatively quickly, it might not be enough of the rotation + translation to determine the triangulation. However, removing frames in between, did not make much difference.

Scatter Plot of 3D Points Input Image + Feature Points

Please let me know if you would like me to provide some of the code.

  • 1
    Are you working with undistorted images or undistorted feature positions? – Micka Oct 03 '21 at 21:41
  • Its currently set up to use undistorted feature positions. Does one have advantage over the other? –  Oct 03 '21 at 21:48
  • 1
    Question 1: both versions look wrong (typos?). the prime symbol is part of the identifier, it is not an inverse or derivative or transpose. K and K' are the (possibly not identical) camera matrices of both views. go with wikipedia and original papers and proper books https://en.wikipedia.org/wiki/Fundamental_matrix_(computer_vision) – Christoph Rackwitz Oct 03 '21 at 22:13
  • 2
    matplotlib will *NOT* scale the plot axes equally. it will look weird because it's autoscaling it all so the whole box is used. perhaps look into `plot.ly`. I would have tagged `matplotlib` but only 5 tags are allowed and I can't choose. – Christoph Rackwitz Oct 03 '21 at 22:17
  • I think I figured out Question 1, thank you for the wiki page. I was under impression that the symbol K' was inverse matrix of K. But I think it means K of the second camera. Hence, its essential matrix = (transpose of K of second camera/frame) * (fundamental matrix) * (K). In my case they are the same. –  Oct 03 '21 at 22:20
  • Thank you, removed one tag and replaced it with matplotlib. –  Oct 03 '21 at 22:32

1 Answers1

1

I believe I have an answer but I am not sure why it works. Hence if someone could expand, plus mention what the 3rd column of the 4D points is, then I will approve that answer and delete this.

Doing this on 4D points after triangulation : points_4d /= points_4d[3] (1)

The documentation does not mention it : https://docs.opencv.org/4.5.3/d9/d0c/group__calib3d.html#gad3fc9a0c82b08df034234979960b778c

My best guess, is that doing (1) is similar to doing this : cv2.convertPointsFromHomogeneous(). Converting from homogeneous space to euclidean space.

enter image description here enter image description here

Edit 20211003 : Please see a comment for further explanation.

  • 2
    good guess, yes that's it. they're just 3D points in a 4D projective space, analogous to 2D points in a 3D projective space. all points (x,y,z,1) * w, for arbitrary nonzero w, in the projective space represent the same 3D point (x,y,z), and (x,y,z,1) is the canonical representative. the additional dimension makes translations possible (among other things). I don't know why those points would be off the 4D plane; I don't understand the SfM algorithms that well. in the case of homographies (and 3D-to-2D projections), they're expected to be, and require that division step. – Christoph Rackwitz Oct 04 '21 at 00:17