checking duplicate images with ORB

Question

Currently i am working on checking duplicate images , so i am using ORB for that, the first part is almost complete, i have the descriptor vector of both the images, now as the second part i want to know how we calculate the scores using hamming distance, and what should be the threshold of saying that these are duplicates

    img1 = gray_image15
    img2 = gray_image25
    # Initiate STAR detector
    orb = cv2.ORB_create() 
    # find the keypoints with ORB
    kp1 = orb.detect(img1,None)
    kp2 = orb.detect(img2,None)
    # compute the descriptors with ORB
    kp1, des1 = orb.compute(img1, kp1)
    kp2, des2 = orb.compute(img2, kp2)

    matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches = matcher.match(des1, des2)
    # Sort them in the order of their distance.
    matches = sorted(matches, key = lambda x:x.distance)

i just want to know the next step in this process so that ultimately i can print yes or no for duplicates. i am using opencv3.0.0 with python 2.7

For c++ implementation use : https://github.com/vonzhou/opencv/blob/master/match/ORB_match.cpp and for python implementation use: http://stackoverflow.com/questions/11114349/how-to-visualize-descriptor-matching-using-opencv-module-in-python. Hope this will help you. — Sagar Patel, Jan 29 '16 at 08:02
Another link is:http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_matcher/py_matcher.html — Sagar Patel, Jan 29 '16 at 08:07
Hi, could you tell us what are the criteria of "duplicate images" from your views? Depend on your answers, the solutions could be quite different, they can range from simple way, like histogram comparison, or complicated algorithms like bag of words, image hashing. If you only want to know how to use OBR to figure out similar object(if it is single object), this is quite easy as Sagar said. — StereoMatching, May 15 '16 at 14:12

score 2 · Answer 1 · edited May 23 '17 at 12:23

Once you obtain the descriptors, you can use a bag-of-words model to cluster the descriptors of the reference image, that is, build a vocabulary (visual words).
Then project the descriptors of the other image on to this vocabulary.
Then you can obtain a histogram showing the distribution of each of the visual words in the two images.
Compare these two histograms using a histogram comparison technique and use a threshold to detect the duplicates. For example, if you use Bhattacharyya distance, a low value means a good match.

I don't have a python implementation of this, but you can find something similar in c++ here.

checking duplicate images with ORB

1 Answers1