0

I'm trying to generate a real-time depth map from an uncalibrated stereo camera. I know how the algorithm roughly has to look like:

  1. detect keypoints (SURF, SIFT)
  2. extract descriptors (SURF,SIFT)
  3. compare and match descriptors (BruteForce, Flann based approaches)
  4. find fundamental mat (findFundamentalMat()) from these pairs
  5. stereoRectifyUncalibrated()
  6. StereoSGBM

I found this algorithm here: 3d reconstruction from 2 images without info about the camera

I also found a similar implementation: Github 3d reconstruction project

And this tutorial: Stereo 3D reconstruction with OpenCV using an iPhone camera.

With the help of those three sources, I put together a test implementation:

    # I size down the original frames
    IMG_L = cv2.resize(IMG_L,(int(WINDOW_WIDTH/3),int(WINDOW_HEIGHT/3)))
    IMG_R = cv2.resize(IMG_R,(int(WINDOW_WIDTH/3),int(WINDOW_HEIGHT/3)))

    window_size = 15
    left_matcher = cv2.StereoSGBM_create(
        minDisparity=0,
        numDisparities=16,
        blockSize=11,
        P1=8 * 3 * window_size ** 2,
        P2=32 * 3 * window_size ** 2,
        disp12MaxDiff=1,
        uniquenessRatio=3,
        speckleWindowSize=1,
        speckleRange=1,
        preFilterCap=63,
        mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY
    )

    right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)

    lmbda = 80000
    sigma = 1.2
    visual_multiplier = 1.0

    wls_filter = cv2.ximgproc.createDisparityWLSFilter(matcher_left=left_matcher)
    wls_filter.setLambda(lmbda)
    wls_filter.setSigmaColor(sigma)

    displ = left_matcher.compute(IMG_L, IMG_R)
    dispr = right_matcher.compute(IMG_R, IMG_L)
    displ = np.int16(displ)
    dispr = np.int16(dispr)
    filteredImg = wls_filter.filter(displ, IMG_L, None, dispr)  # important to put "imgL" here!!!

    filteredImg = cv2.normalize(src=filteredImg, dst=filteredImg, beta=0, alpha=255, norm_type=cv2.NORM_MINMAX);
    filteredImg = np.uint8(filteredImg)

With this piece of code, I generate this output: Video

Now you probably see my problems:

  1. My depth map is flickering and is not (how I call it) "colour consistent"
  2. The quality of the depth map is very bad (smudgy)
  3. It is too slow and therefore not usable in real-time

For the first problem, I would need a good solution to get rid of this flickering. Is there maybe a way to take the previous depth map to account?

For the second problem, I probably have an idea of what I should do: I need to rectify my stereo images (as the description of the algorithm suggests). In order to rectify those images, I would need to use SIFT or SURF. But I read that SIFT and SURF are too slow to run in real time, so I probably need some other kind of solution?

I will focus on the first and second problems before I try to optimize the program, so you can for now ignore my third problem (for now).

Thanks for your help :)

Xen0n
  • 31
  • 11
  • Try ORB for a faster alternative to SIFT/SURF. There will be slightly more false positives though. The flickering is hard to avoid when the scene does not have much texture in some areas. – Richard K. Wade Jun 05 '19 at 11:54
  • Ok, thanks. Yes, the video comes from an endoscope, so a lot of textureless tissue and blood... I'll try and have a closer look at ORB. – Xen0n Jun 05 '19 at 12:09
  • Have you had any luck with your flickering issue ? I'm currently working on a similar project of real-time video disparity image acquisition and ran into the same issue. The image I get is quite neat but it keeps flickering. For the real-time part of your question, there are several parts of the algorithm that can be modified to improve speed (at the cost of quality most of the time though). For example for me using StereoBM instead of StereoSGBM increased my framerate consistently. As for the image quality, well as a preprocessing step you might want to calibrate your camera if possible. That – Theonolev Apr 29 '20 at 09:30

0 Answers0