There's a lot of different algorithms for computing the similarity between two images, but I can't find anything on how you would store this information in a database such that you can find similar images quickly.
By "similar" I mean exact duplicates that have been rotated (90 degree increments), color-adjusted, and/or re-saved (lossy jpeg compression).
I'm trying to come up with a "fingerprint" of the images such that I can look them up quickly.
The best I've come up with so far is to generate a grayscale histogram. With 16 bins and 256 shades of gray, I can easily create a 16-byte fingerprint. This works reasonably well, but it's not quite as robust as I'd like.
Another solution I tried was to resize the images, rotate them so they're all oriented the same way, grayscale them, normalize the histograms, and then shrink them down to about 8x8, and reduce the colors to 16 shades of gray. Although the miniature images were very similar, they were usually off by a pixel or two, which means that exact matching can't work.
Without exact-matching, I don't believe there's any efficient way to group similar photos (without comparing every photo to every other photo, i.e., O(n^2)).
So, (1) How can I create I create a fingerprint/signature that is invariant to the requirements mentioned above? Or, (2) if that's not possible, what other metric can I use such that given a single image, I can find it's best matches in a database of thousands?