3

I am very new to image processing and image matching and don't understand it very clearly. What I need to do is a) Take a image b) Extract features from it (SIFT, SURF are better for matching) c) Create a Hash (like MD5 or SHA1) d) Store it in the database and search different images if any are similar.

Bascially (A Tineye)

I referred to OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?. I also checked the pHash and tried to run the SIFT SURF via opencv simple_matcher.cpp

Read a little about Geometric Hashing/ Local Sensitive Hashing but not sure if I am going in to right direction.

How could I create a hash from features exctracted from SIFT/SURF (OpenCV)? I would be grateful if someone could tell simple steps to be followed or some reference to move forward.

Community
  • 1
  • 1
bitvijays
  • 31
  • 1
  • 2
  • 1
    Is your end goal to match one image to a database of many? – kamjagin Jun 30 '13 at 16:45
  • @kamjagin Yes. I am trying to build a small application where Suppose If we found 500 images on one laptop and 100 images on another laptop. I am trying to find if any images has been shared between them. Images can be modified. So I can't just MD5 Hash them. – bitvijays Jun 30 '13 at 16:59

1 Answers1

4

Ok, there are a ton of nice ways of matching images with various level of complexity. I will provide a suggestion that I think is good enough for the problem that you described and really simple to implement (since you say that you are supernew to CV :) ).

  1. Compute sparse or dense SURF features on the images on computer1
  2. Create a vocabulary (for this task generating a random one is probably also good enough)
  3. Assign the features to the vocabulary (nn)
  4. Build a kd-tree (to use for nearest neighbour) or learn some classifier (like sum)
  5. Apply the classifier to the images on computer2 (after having computed surfs and assigned to the vocabulary)

The same images will most likely produce the highest classification scores.

The reason for why I suggest this approach to the faster and hashing approaches is that it is unlikely that you will have performance issues for as few images as ~500, and since there is a nice example in opencv (bagofwords_classification.cpp), that you can follow step-by-step to achieve what you want.

kamjagin
  • 3,614
  • 1
  • 22
  • 24
  • I have few queries:1) Do weCompute sparse or dense SURF features like the code in https://code.ros.org/trac/opencv/browser/trunk/opencv/samples/cpp/matcher_simple.cpp?rev=3204 2) I read about Bag of words. It was used mainly for Visual Object Classes pascal. I am not exactly doing that. My application objective is to find images of child abuses. 3) I should have a central database which contain all the hashes related to images found. So that no matter how many laptop are seized, we would just scan the images found in the laptop and check the database. 4) Images can be >500, may be 10000. – bitvijays Jun 30 '13 at 20:00
  • 1
    1. Yep - this is a way for computing sparse SIFT descriptors for two different images, which is good enough for this problem. 2. BOW is just a way of describing the content in an image without spatial constraints (although pyramid-type spatial constraints improve things). It will do the trick on your problem. 3) Still very suitable. Each image will be described by its words. 4) Then you should look at details to make it more efficient. You could start by looking at the work of http://people.rennes.inria.fr/Herve.Jegou/. There is also some matlab code on his web-page – kamjagin Jun 30 '13 at 20:08