Image comparison with a database

Question

I would like to know if what I'm trying to do is even possible, or if I'm just trying to do the impossible/ solve an unsolvable problem up to know.

My goal is to compare images (They`re going to have noise, all of them with very similar noise, though) with a database of images, and tell me if it finds a match. For instance:img1 img2

I would like to point out that I already searched, but side from theorical discussions I never found an actual application, and I failed to understand how to apply some of these ideas so far (Histogram comparison flat out fails in this case, I couldn't implement data trees, phash also fails).

How would I even tell they're both similar? Are there algorithms I can implement to tell me that?

I suppose I should use some sort of noise reduction/edge detection first (I already tried some and had success with edge detection, actually). So, assuming I have a decent edge detection, how could I compare them?

I understand this is not an easy topic, but I would like to know if I'm fighting a lost battle and should just accept that and give up.

[There was just a question about this yesterday.](http://stackoverflow.com/questions/15232982/identify-images-with-same-content-in-java/15237764#15237764.) The problem is hard enough that people have devoted their entire PhD thesis to doing it efficiently. Your question should just be about how far you want to go. — Andrew Mao, Mar 07 '13 at 02:42
Pretty much every way to compare images uses histograms in some way or other - from simple color histogram comparison, to edge/gradient comparison (e.g. HOG: Histogram of Oriented Gradients). Obviously you need to store histogram or hash data for later use somehow, but an RDBMs isn't appropriate for this, you'd need a special structure you can store on-disk, but that's all I can tell you. — Dai, Mar 07 '13 at 02:45
@AndrewMao Thanks, Im reading these papers right now, hopefully I can find a way to achieve my goal with them. — ShizukaSM, Mar 07 '13 at 03:57
@Dai a special structure? WHy aren't RDBMs appropriate? Even for a small number of images?(<5000) — ShizukaSM, Mar 07 '13 at 03:57

bjou · Answer 1 · 2013-05-29T14:57:05.107

This is a long-standing research challenge in computer vision and pattern recognition, and as @AndrewMao said, there are many PhD theses and academic publications devoted to this topic. A fundamental question is what kind of output you want: (1) a single "matching" image from your database, or (2) a ranked list of database images with decreasing confidence of match. (1) is typically known as "Image Near-Duplicate Detection" and (2) is more broadly known as "Content-based Image Retrieval".

Today, the popular approach to both of these problems is some variant of (A) extracting low-level descriptors, e.g. most notably SIFT, at detected feature points, e.g. blob regions recognized by MSER ,(B) applying some geometric verification, e.g. RANSAC, (C) measuring the distance between the remaining descriptors from pairs of images, e.g. via Euclidean distance, (D) thresholding to keep matched descriptors, and (E) counting the number of matches from the query image to each image in the database.

Visualized, for images like the following: http://www.vlfeat.org/demo/sift_match_1.jpg.pagespeed.ce.Ch9_KgDq8u.jpg

Matching would yield (SIFT points in green and matches in blue): http://www.vlfeat.org/demo/sift_match_2.jpg.pagespeed.ce.JvQxRCluzm.jpg

Image comparison with a database

1 Answers1