2

This is my first image processing application, so please be kind with this filthy peasant.

THE APPLICATION:

I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).

Any suggestion on how to implement such an application is well accepted.

FEATURE DESCRIPTIONS IN OPENCV:

I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.

From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.

MOST SIMILAR IMAGE PROBLEM AND LSH:

The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.

THE QUESTION:

What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.

Is there any implementation for this problem?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138

1 Answers1

2

I will provide a general answer, going beyond the scope of OpenCV library.


Quoting this answer:

descriptors: they are the way to compare the keypoints. They summarize, in vector format (of constant length) some characteristics about the keypoints.

With that said, we can imagine/treat (geometrically) a descriptor as point in a D dimensional space. So in total, all the descriptors are points in a D dimensional space. For example, for GIST, D = 960.

So actually descriptors describe the image, using less information that the whole image (because when you have 1 billion images, the size matters). They serve as the image's representatives, so we are processing them on behalf of the image (since they are easier/smaller to treat).


The problem you are mentioning is the Nearest Neighbor problem. Notice that an approximate version of this problem can lead to significant speed ups, when D is big (since the curse of dimensionality will make the traditional approaches, such as a kd-tree very slow, almost linear in N (number of points)).

Algorithms that solve the NN problem, which is a problem of optimization, are usually generic. They may not care if the data are images, molecules, etc., I for example have used my kd-GeRaF for both. As a result, the algorithms expect N points in a D dimensional space, so N descriptors you might want to say.

Check my answer for LSH here (which points to a nice implementation).


Edit:

LSH expects as input N vectors of D dimension and given a query vector (in D) and a range R, will find the vectors that lie within this range from the query vector.

As a result, we can say that every image is represented by just one vector, in SIFT format for example.

You see, LSH doesn't actually solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, High-dimensional approximate nearest neighbo. kd-GeRaF and FLANN solve directly the k-NN problem.

Community
  • 1
  • 1
gsamaras
  • 71,951
  • 46
  • 188
  • 305
  • Thanks for your answer, but one thing it's still not clear to me: each descriptor is associated to a keypoint. Each descriptor is represented in D dimensions. Since there are multiple keypoints, in order to describe an image we need a matrix KxD where K is the number of keypoints, where K (from what I've seen) can change from image to image even using the same algorithm. So it seems that an image is described by a matrix and not by a vecotr (which is what we use in LHS). – justHelloWorld May 20 '16 at 02:27
  • anyway I'm considering also kernalized locality-sensitive hashing, do you know some recent implementation about it? – justHelloWorld May 20 '16 at 08:13
  • @justHelloWorld **yes, the image is described by an N x D matrix**, where N is the number of (key)points and D the dimensions! I am not sure what you mean by kernalized LSH. Anyway, the only implementation I have used and I trust, is the one pointed in the linked answer in my answer. Good question BTW. – gsamaras May 20 '16 at 09:25
  • Sorry, but I'm afraid that something is still not clear to me: we use LHS to solve the NN problem for high-dimensional **vectors**, not matrices! So I don't understand how we can use LHS for "finding the most similar image", since from your last comment an image is represented through a NxD matrix (and not a D vector)! Where am I wrong? :D – justHelloWorld May 20 '16 at 09:37
  • PS: I read some stuff about kernalized functions from [this](http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html) wonderful article and they can be used as similarity metric instead of lp norms, cosine similarity, hamming distance or whatever and kernalized lsh use this metric instead of the others. – justHelloWorld May 20 '16 at 09:42
  • @justHelloWorld sorry, see *EDIT*. LSH expects as input **N** vectors of **D** dimension and given a query vector (in **D**) and a range **R**, will find the vectors that lie within this range from the query vector. You see, LSH doesn't solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, [here](http://arxiv.org/pdf/1603.09596.pdf). [kd-GeRaF](https://gsamaras.wordpress.com/projects/#geraf) and [FLANN](http://www.cs.ubc.ca/research/flann/) solve directly the k-NN problem. – gsamaras May 20 '16 at 11:50
  • Ok, the meaning of R is clear: I think it stands for "radius", e.g. the radius of the ball (which center is our query point) where we look for our k-NN. It is clear also that if our application is about images with keypoints and descriptors, the query will be a matrix NxD (in SIFT for example D=128). So, I don't understand your assertion: As a result, we can say that every image is represented by just one vector, in SIFT format for example. I don't unerstand the format of the link posted, but I don't understand how the matrix Nx128 is trasformed to a single vector. – justHelloWorld May 20 '16 at 13:35
  • Another question: what is the epsilon (or similar letter) in the sublinear complexity of LSH? – justHelloWorld May 20 '16 at 14:28
  • @justHelloWorld yes, it's the radius. **No, the query is not a matrix**. The query is a vector (did you see my edit in the answer?) of dimension D. So, the NNS is to find the closest vector of all the vectors we have already (our dataset) to the given query. As for ε, I know that, *but*, comment section is not for extended chat, thus I would kindly advice you to accept the answer and post a new question, asking about ε and give me the link to that question, if you like of course. :) – gsamaras May 20 '16 at 18:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/112557/discussion-between-justhelloworld-and-gsamaras). – justHelloWorld May 21 '16 at 03:08
  • 1
    You're totally right! [Here is the question](http://stackoverflow.com/questions/37358467/what-is-the-%CE%B5-epsilon-parameter-in-locality-sensitive-hashing-lsh) – justHelloWorld May 21 '16 at 03:09