This is a very common problem in Computer Vision nowadays. A simple solution is really simple. But there are many, many variants for more sophisticated solutions.
Simple Solution
- Feature Detector and Descriptor based.
The idea here being that you get a bunch of keypoints and their descriptors (search for SIFT/SURF/ORB). You can then find matches easily with tools provided in OpenCV. You would match the keypoints in your query image against all keypoints in the training dataset. Because of typical outliers, you would like to add a robust matching technique, like RanSaC. All of this is part of OpenCV.
- Bag-of-Word model
If you want just the image that is as much the same as your query image, you can use Nearest-Neighbour search. Be aware that OpenCV comes with the much faster Approximated-Nearest-Neighbour (ANN) algorithm. Or you can use the BruteForceMatcher.
Advanced Solution
If you have many images (many==1 Million), you can look at Locality-Sensitive-Hashing (see Dean et al, 100,000 Object Categories).
If you do use Bag-of-Visual-Words, then you should probably build an Inverted Index.
Have a look at Fisher Vectors for improved accuracy as compared to BOW.
Suggestion
Start by using Bag-Of-Visual-Words. There are tutorials on how to train the dictionary for this
model.
Training:
- Extract Local features (just pick SIFT, you can easily change this as OpenCV is very modular) from a subset of your training images. First detect features and then extract them. There are many tutorials on the web about this.
- Train Dictionary. Helpful documentation with a reference to a sample implementation in Python (
opencv_source_code/samples/python2/find_obj.py
)!
- Compute Histogram for each training image. (Also in the BOW documentation from previous step)
- Put your image descriptors from the step above into a FLANN-Based-matcher.
Querying:
- Compute features on your query image.
- Use the dictionary from training to build a BOW histogram for your query image.
- Use that feature to find the nearest neighbor(s).