4

I would like to match a picture with a database which contains more than 2500 pictures at the moment, but I need to find a way to get good results with at least 10k pictures.

I already read a lot of posts on stackoverflow but I couldn't find a proper solution to my problem. I thought about using histograms, but if I understand well, it is useful to find similarities, however I need a 'perfect' match.

I currently have some code working to do the task, but it is too slow (about 6 seconds to find a match with 2500 images)

I'm using ORB detector cv2.ORB() to find keypoints and descriptors, FlannBasedMatcher and findHomography function with RANSAC as you can see below.

FLANN_INDEX_LSH = 6
flann_params = dict(algorithm = FLANN_INDEX_LSH, table_number = 6, key_size = 12, multi_probe_level = 1)
...
self.matcher = cv2.FlannBasedMatcher(params, {})
...
(_, status) = cv2.findHomography(ptsA, ptsB, cv2.RANSAC, 4.0)

I want to know if there is a better, and more important, a faster way to match with my database, and maybe a different way to store pictures in a database (I'm currently saving keypoints and descriptors).

I hope I was clear enough, if you need more details, post in comments.

Misery
  • 495
  • 4
  • 17
  • Hm, you aren't very descriptive. Saving just the key points and comparing those seems to make sense. Maybe parallelise the search so that it's faster (`multiprocessing`). Also, there are some dedicated algorithms to find similarities between datasets, I just started looking at this: http://scikit-learn.org/stable/modules/metrics.html – Aleksander Lidtke Apr 10 '15 at 14:09
  • Thanks for your reply. I know that I'm not very descriptive but I don't really know what other informations are needed. I've been thinking about multiprocessing to make it faster but firstly I would know if there is not a better way than the one I'm using. Your link sounds interesting, I'll look into it if I'm not getting better results soon. – Misery Apr 10 '15 at 14:26
  • You say you need a perfect match, is the search image identical to one in the database? – user3510227 Apr 10 '15 at 15:59
  • @Misery that's a good way to go about this - first make it fast in serial and then throw CPUs/cores at it. But maybe you can just get away with it this time. But, in essence, you need to extract some data from all the pictures, store it in a database, and then compare a new picture to the database. Perhaps though you could sort the pictures into groups? You could first check which group the new picture belongs to and then search a smaller number of group members to get a match. Or even have subgroups of groups, e.g. `mostly red/mostly green/mostly blue -> with trees/without trees -> ...`? – Aleksander Lidtke Apr 10 '15 at 17:11
  • @user3510227 The picture to compare is taken from a video capture on real time, it's a page from a book and my code needs to find which page it is. – Misery Apr 14 '15 at 08:12
  • @AleksanderLidtke Thank you for your reply. It's a good idea to split the database into groups, I never thought about it, I'll implement this after making some optimisations. I found a way to get far better result in serial for now. – Misery Apr 14 '15 at 08:12
  • @Misery Would it be possible to just check the corners for a page number instead of examining the entire page? You could then directly look up the page in your database using it as an index. – user3510227 Apr 14 '15 at 08:25
  • possible duplicate of [Checking images for similarity with OpenCV](http://stackoverflow.com/questions/11541154/checking-images-for-similarity-with-opencv) – Sam Apr 14 '15 at 08:27
  • @user3510227 I think it would be possible yes, but in my case I need to do something more accurate. I will have more than one book in database, and I think it will be more complicated to implement than the solution I use, and not a lot faster. – Misery Apr 14 '15 at 08:34

1 Answers1

4

The point of what I am doing is to recognize a page from a book on a video capture, that's why I needed my code to be fast, and accurate.

I found a faster way to do the job, I built a FLANN index with the whole database at startup (which is not that slow with 3k pictures), I got help from this link. Also, and that's the most important part, I changed my flann_params to this :

flann_params = dict(algorithm = FLANN_INDEX_LSH, table_number = 10, key_size = 20, multi_probe_level = 0)

In order to not lose accuracy with these parameters, I changed the number of feature points I extract with ORB detector from 400 to 700.

It fixed my problem, before the match was done between 2 and 3 seconds (6 seconds without FLANN index), now it is around 25/30ms

But even after this solution, I'm still open to new suggestions to improve accuracy without losing much speed.

Misery
  • 495
  • 4
  • 17