Structure from Motion with Optical Flow

Question

Let say I have a video from a drive recorder. I want to construct the recorded scene's points cloud using structure from motion technique. First I need to track some points.

Which algorithm can yield a better result? By using the sparse optical flow (Kanade-Lucas-Tomasi tracker) or the dense optical flow (Farneback)? I have experimented a bit but cannot really decide. Each one of them has their own strengths and weaknesses.

The ultimate target is to get the points cloud of the recorded cars in the scene. By using the sparse optical flow, I can track the interesting points of the cars. But it would be quite unpredictable. One solution is to make some kind of grid in the image, and force the tracker to track one interesting point in each of the grid. But I think this would be quite hard.

By using the dense flow, I can get the movement of every pixel, but the problem is, it cannot really detect the motion of cars that have only little motion. Also, I have doubt that the flow of every pixel yielded by the algorithm would be that accurate. Plus, with this, I believe I can only get the pixels movement between two frames only (unlike by using the sparse optical flow in which I can get multiple coordinates of the same interesting point along time t)

@Sunreef I've just tried the `good features to track (gftt)`. How can SIFT be any helpful than the `gftt`? — Hafiz Hilman Mohammad Sofian, Jun 29 '16 at 13:58
@Micka Have thought about it. But wouldn't it be quite heavy computationally? — Hafiz Hilman Mohammad Sofian, Jun 29 '16 at 13:59

Yasin Yousif · Answer 1 · 2018-09-02T18:04:56.670

Your title indicate SFM which includes pose estimation ,

tracking is only the first step (matching) , if you want point cloud from video (very hard task) first thing I would think of, is bundle adjustment which also works for MVE,

Nevertheless , for video we can do more, as frames are too close to each other, we can use faster algorithm like ( optical flow ) , /than matching SIFT/ and extract F matrix from it , then :

E = 1/K * F * K

Back to your original question , what is better:

1) Dense Optical flow , or

2) Sparse one .

apparently you are working offline , so no importance of speed ,but I would recommend the sparse one ,

Update

for 3d reconstruction , the dense may seem more attractive, but as you said it's rarely robust, so you can use sparse but add as many points as you want to make it semi-dense ,

I cannot name but a few methods that could do this, like mono-slam or orb-slam

Final Update

use semi-dense as I write earlier, but SFM always assume static objects (no movement) or it will never works.

in practical using all the pixels in the image is something never used for 3d reconstruction (not direct methods), and always SIFT were praised way for features detecting and matching, .. recently all the pixels were used in different kind of calibration ,for ex in methods like: Direct Sparse odometry and LSD known as Direct methods

Structure from Motion with Optical Flow

1 Answers1