OpenCV / Python : Fast way to match a picture with a database

Question

I would like to match a picture with a database which contains more than 2500 pictures at the moment, but I need to find a way to get good results with at least 10k pictures.

I already read a lot of posts on stackoverflow but I couldn't find a proper solution to my problem. I thought about using histograms, but if I understand well, it is useful to find similarities, however I need a 'perfect' match.

I currently have some code working to do the task, but it is too slow (about 6 seconds to find a match with 2500 images)

I'm using ORB detector cv2.ORB() to find keypoints and descriptors, FlannBasedMatcher and findHomography function with RANSAC as you can see below.

FLANN_INDEX_LSH = 6
flann_params = dict(algorithm = FLANN_INDEX_LSH, table_number = 6, key_size = 12, multi_probe_level = 1)
...
self.matcher = cv2.FlannBasedMatcher(params, {})
...
(_, status) = cv2.findHomography(ptsA, ptsB, cv2.RANSAC, 4.0)

I want to know if there is a better, and more important, a faster way to match with my database, and maybe a different way to store pictures in a database (I'm currently saving keypoints and descriptors).

I hope I was clear enough, if you need more details, post in comments.

Hm, you aren't very descriptive. Saving just the key points and comparing those seems to make sense. Maybe parallelise the search so that it's faster (`multiprocessing`). Also, there are some dedicated algorithms to find similarities between datasets, I just started looking at this: http://scikit-learn.org/stable/modules/metrics.html — Aleksander Lidtke, Apr 10 '15 at 14:09
Thanks for your reply. I know that I'm not very descriptive but I don't really know what other informations are needed. I've been thinking about multiprocessing to make it faster but firstly I would know if there is not a better way than the one I'm using. Your link sounds interesting, I'll look into it if I'm not getting better results soon. — Misery, Apr 10 '15 at 14:26
You say you need a perfect match, is the search image identical to one in the database? — user3510227, Apr 10 '15 at 15:59
@Misery that's a good way to go about this - first make it fast in serial and then throw CPUs/cores at it. But maybe you can just get away with it this time. But, in essence, you need to extract some data from all the pictures, store it in a database, and then compare a new picture to the database. Perhaps though you could sort the pictures into groups? You could first check which group the new picture belongs to and then search a smaller number of group members to get a match. Or even have subgroups of groups, e.g. `mostly red/mostly green/mostly blue -> with trees/without trees -> ...`? — Aleksander Lidtke, Apr 10 '15 at 17:11
@user3510227 The picture to compare is taken from a video capture on real time, it's a page from a book and my code needs to find which page it is. — Misery, Apr 14 '15 at 08:12
@AleksanderLidtke Thank you for your reply. It's a good idea to split the database into groups, I never thought about it, I'll implement this after making some optimisations. I found a way to get far better result in serial for now. — Misery, Apr 14 '15 at 08:12
@Misery Would it be possible to just check the corners for a page number instead of examining the entire page? You could then directly look up the page in your database using it as an index. — user3510227, Apr 14 '15 at 08:25
possible duplicate of [Checking images for similarity with OpenCV](http://stackoverflow.com/questions/11541154/checking-images-for-similarity-with-opencv) — Sam, Apr 14 '15 at 08:27
@user3510227 I think it would be possible yes, but in my case I need to do something more accurate. I will have more than one book in database, and I think it will be more complicated to implement than the solution I use, and not a lot faster. — Misery, Apr 14 '15 at 08:34

score 4 · Accepted Answer · answered Mar 29 '17 at 10:01

The point of what I am doing is to recognize a page from a book on a video capture, that's why I needed my code to be fast, and accurate.

I found a faster way to do the job, I built a FLANN index with the whole database at startup (which is not that slow with 3k pictures), I got help from this link. Also, and that's the most important part, I changed my flann_params to this :

flann_params = dict(algorithm = FLANN_INDEX_LSH, table_number = 10, key_size = 20, multi_probe_level = 0)

In order to not lose accuracy with these parameters, I changed the number of feature points I extract with ORB detector from 400 to 700.

It fixed my problem, before the match was done between 2 and 3 seconds (6 seconds without FLANN index), now it is around 25/30ms

But even after this solution, I'm still open to new suggestions to improve accuracy without losing much speed.

OpenCV / Python : Fast way to match a picture with a database

1 Answers1