0

I have a sequence of images. I would like to estimate the camera pose for each frame in my sequence.

I have calculated features in each frame, and tracked them through the sequence. I would now like to estimate the camera pose for each frame. I do so using the following openCV routines:

Mat essentialMatrix = findEssentialMat(pointsA, pointsB, f, pp, RANSAC, 0.999, 1.0, mask);
recoverPose(essentialMatrix, pointsA, pointsB, R, T, focalLength, principlePoint);

Where pointsA and pointsB are comprised of 2D coordinates of features that are present in both frames. pointsA is associated with the frame before pointsB.

The problem I am encountering is that the R and T estimations are very noisy, to the point where I believe something is wrong with my pose estimation.

My question is, how do I estimate the camera pose from two sets of features?

Note: I am familiar with this answered question. However, I believe openCV3 now includes methods that address this problem more eloquently.

Below are the inter-frame differences in X translation. As you can see, they are very different (smooth = expected, jagged = estimated)...

enter image description here

Community
  • 1
  • 1
MM.
  • 4,224
  • 5
  • 37
  • 74

1 Answers1

0

Have you tried to see if you have false matches in your features? And what kind of noise do you have in regards to the position of the features in the matches?

OpenCV implements a very basic RANSAC, that is not very robust with real life cases. You need to pass a really good set of matches (and possibly of 100+ features so that the inlier calculation is more robust).

Also it might be that a degenerate configuration (e.g. points on a line) is chosen by RANSAC as the best one.

Other problems could be that the points you pass are mostly on the same object/plane so the algorithm doesn't have enough information to calculate the pose reliably.

aledalgrande
  • 5,167
  • 3
  • 37
  • 65
  • I perform some rudimentary filtering on the features (track length, cv::KeyPointsFilter::retainBest), and pass in a few hundred features per frame. The test sequence I am using is generated using Blender, so the ground truth path is smooth. – MM. Jul 30 '15 at 23:09
  • Can you add some example output along with expected output? – aledalgrande Jul 30 '15 at 23:22
  • I have added some x-translation comparison data above – MM. Jul 30 '15 at 23:31
  • Can you also post the full code you are using, it's a bit difficult to help without more info. An example of point match on image pair would be great too. – aledalgrande Jul 30 '15 at 23:49
  • Unfortunately posting the code is a little tricky atm. I'm using a SURF feature detector and tracker. A visual inspection of the tracks suggests they are good. – MM. Jul 31 '15 at 00:02
  • Erm, wait a min, looking at your image it seems you use 5-point not only for the first pair, but also the following ones? That will never work. Scale changes every time. – aledalgrande Jul 31 '15 at 02:00
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84755/discussion-between-mm-and-aledalgrande). – MM. Jul 31 '15 at 03:19
  • how could one double check the false matches in sequence of frames? Is there any sample code I could use for the RANSAC implementation? – Farid Alijani Jun 20 '19 at 12:45