2

At the moment I've got an database with over 100.000 images, they ain't the same size or anything like that but I want to make the following for my compagny:

I insert/upload an image and the system returns the image which is most likely the same. I don't know what algorithm to use but it needs to be fast. I can pre-process all the other images and put some info in the database which I use for the comparison.

Now what I want to know what the fastest way is to compare the images (with a good chance of being the same image). And what data I should save into the database (I could probably figure this one out myself if I got the algorithm).

It shouldn't take more then 5 minutes to compare the uploaded image to all the images in the database.

Thanks in advance!

Julian

Look at www.tineye.com, they have some kind of algorithm that I'm looking for. Guessing they use a very complex one, I just need one that does same thing but with lesser rate of succes.

Julian
  • 1,105
  • 2
  • 26
  • 57
  • You may find some useful information at http://stackoverflow.com/questions/1261687/ – hangy Jan 10 '11 at 14:34
  • Thanks for your comment hangy, I found some other posts on stackoverflow. But most of them ain't having any code/link to code but rather link to mathematical articals. I wonder if there are library's of some kind out there. – Julian Jan 10 '11 at 14:36
  • 1
    What do you intend to do? If you want to find very similarimages being posted, use the information at the question provided by hangy. If you want to detect the posting of the exact same file, just do a hash of the posted file and compare it to the hash of the previously posted files. It would be lightning fast to find duplicates. – Falanwe Jan 10 '11 at 14:38
  • "Image registration" is the process of trying to match two images to eachother. Searching for that term may help you, e.g.: http://stackoverflow.com/questions/3344138/image-comparison-rotation-alignment-and-scaling – Nate Kohl Jan 10 '11 at 15:00
  • The image registration is more ment for forms that are filled in, as far as I have red the artical. But I'll look further for that term. Thanks for the time! – Julian Jan 10 '11 at 15:08

2 Answers2

2

The way I would do it is I'd generate a really small (say.. 1/50 of the original image size) image from every image you're comparing against, and store the thumbnail image path along with the original size in the database. I'd keep the thumbnails as uncompressed bmp's for speed and loss-free-ness (I just made that word up!), since they're so small anyway.

To compare your new image against the other ones, shrink it down by the same amount and compare it against the others pixel by pixel, with a certain threshold (say.. 10% difference from the original).

If it passes this test, you can do a full blown pixel by pixel compare against the original image.

edit: I just want to mention that I went down the probabilistic way before too. It worked OK, but building the meta data for the images took forever, and there were a lot of false positives. Instinctively, I think that calculating local averages for each grid rectangle of your image (which is what shrinking your image down does) would give similar, if not better results.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • The idea of making the image smaller is good, thanks. The thing is that the pictures in my database are from internet. So like you could guess there will be 100 pictures of the same thing but looking different in color/size (cut-off parts) etc. I would like to have some comparison in parts of the images. There is some information about "keypoint recognition" here: http://cvlab.epfl.ch/publications/publications/2006/LepetitF06.pdf but I can't find any code for it. Will be far to much work to make it myself. – Julian Jan 10 '11 at 14:44
  • If you need real pattern recognition, my method won't be enough. You need to build a real pattern recognition engine for this. Start with OpenCV and build up! – Blindy Jan 10 '11 at 14:57
1

The best way for comparison is convert image to gray scale format and compare intensity of gray color. Its the fastest way used in real-time systems.

Also if you want to achieve higher qaullity and use colored images - use CIE 1994 or CIE 2000 as color difference formula

Siarhei Kuchuk
  • 5,296
  • 1
  • 28
  • 31