I'm trying to generate a real-time depth map from an uncalibrated stereo camera. I know how the algorithm roughly has to look like:
- detect keypoints (SURF, SIFT)
- extract descriptors (SURF,SIFT)
- compare and match descriptors (BruteForce, Flann based approaches)
- find fundamental mat (findFundamentalMat()) from these pairs
- stereoRectifyUncalibrated()
- StereoSGBM
I found this algorithm here: 3d reconstruction from 2 images without info about the camera
I also found a similar implementation: Github 3d reconstruction project
And this tutorial: Stereo 3D reconstruction with OpenCV using an iPhone camera.
With the help of those three sources, I put together a test implementation:
# I size down the original frames
IMG_L = cv2.resize(IMG_L,(int(WINDOW_WIDTH/3),int(WINDOW_HEIGHT/3)))
IMG_R = cv2.resize(IMG_R,(int(WINDOW_WIDTH/3),int(WINDOW_HEIGHT/3)))
window_size = 15
left_matcher = cv2.StereoSGBM_create(
minDisparity=0,
numDisparities=16,
blockSize=11,
P1=8 * 3 * window_size ** 2,
P2=32 * 3 * window_size ** 2,
disp12MaxDiff=1,
uniquenessRatio=3,
speckleWindowSize=1,
speckleRange=1,
preFilterCap=63,
mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
lmbda = 80000
sigma = 1.2
visual_multiplier = 1.0
wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
wls_filter.setLambda(lmbda)
wls_filter.setSigmaColor(sigma)
displ = left_matcher.compute(IMG_L, IMG_R)
dispr = right_matcher.compute(IMG_R, IMG_L)
displ = np.int16(displ)
dispr = np.int16(dispr)
filteredImg = wls_filter.filter(displ, IMG_L, None, dispr) # important to put "imgL" here!!!
filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX);
filteredImg = np.uint8(filteredImg)
With this piece of code, I generate this output: Video
Now you probably see my problems:
- My depth map is flickering and is not (how I call it) "colour consistent"
- The quality of the depth map is very bad (smudgy)
- It is too slow and therefore not usable in real-time
For the first problem, I would need a good solution to get rid of this flickering. Is there maybe a way to take the previous depth map to account?
For the second problem, I probably have an idea of what I should do: I need to rectify my stereo images (as the description of the algorithm suggests). In order to rectify those images, I would need to use SIFT or SURF. But I read that SIFT and SURF are too slow to run in real time, so I probably need some other kind of solution?
I will focus on the first and second problems before I try to optimize the program, so you can for now ignore my third problem (for now).
Thanks for your help :)