0

When implementing monocular SLAM or Structure from Motion using single camera, translation can be estimated up to unknown scale. It is proven that without any other external information, this scale can not be determined. However, my question: How to unify this scale in all sub translations. For example, if we have 3 frame (Frame0, Frame1 & Frame2), we applied tracking as follow:

  • Frame0 -> Frame 1 : R01, T01 (R&T can be extracted using F Matrix and K matrix and Essential Matrix decompostion)
  • Frame 1-> Frame 2 : R12, T12

The problem is T01 & T12 are normalized so their magnitude is 1. However, in real, T01 magnitude may be twice as T12.

How can I recover the Relative magnitude between T01 and T12?

P.S. I do not want to know what is exactly T01 or T12. I just want to know that |T01| = 2 * |T12|.

I think it is possible because Monocular SLAM or SFM algorithms are already exists and working well. So, there should be some way to do this.

Humam Helfawi
  • 19,566
  • 15
  • 85
  • 160

1 Answers1

2

Calculate R,t between frames 2 & 0 and connect a triangle between the three vertices formed by the three frames. the only possible closed triangle (up to a single scale) will be formed when the relative translations are known up to a scale.

YoniChechik
  • 1,397
  • 16
  • 25
  • Thanks for your answer! So in other words, if I did bundle adjustment on the three frames, the triangle condition should be satisfied by its own.. Did I understand correctly? – Humam Helfawi Jun 20 '20 at 06:54
  • @HumamHelfawi this is correct. Bundle adjustmant in essence does some kind of "tightening" on the given edges of the connected graph. – YoniChechik Jun 21 '20 at 07:57