Given two sets of vectors, how do I find the closest vector in the second set for each vector in the first set?

Question

Given: Two sets {S1, S2} of vectors of dimension D. S1 is represented by a N*D matrix and accordingly is S2 represented by a M*D matrix.

I am looking for an elegant way to get for every vector s1 in S1 the nearest neighbour s2 in S2 and the according distance.

A simple approach would of course be to have two for loops and get

dist = norm(s1 - s2);

However, there must be a more elegant and efficient way to do this.

rayryeng · Accepted Answer · 2015-11-11T23:01:06.803

Yup. With the mighty power of bsxfun and permute, with a side of sum, and a dash of reshape. This would be the first part, where you calculate the pair-wise distances between a point in S1 and another point in S2:

out = reshape(sqrt(sum(bsxfun(@minus, S1, permute(S2, [3 2 1])).^2, 2)), size(S1,1), size(S2,1));

The last thing you need now is to determine the closest vector in S2 to each of S1. That can be done using min:

[dist,ind] = min(out, [], 2);

dist would contain the smallest distance between a point in S1 with a point in S2, and ind would tell you which point that was.

This code looks very intimidating, but let's break it up into pieces.

permute(S2, [3 2 1]): This takes the matrix S2, which is a M x D matrix and shuffles the dimensions so that it becomes a 1 x D x M matrix.... now why would we want to do that? Let's move onto the next part and it'll make more sense.
bsxfun(@minus, S1, ...): bsxfun stands for Binary Singleton EXpansion FUNction. What bsxfun does is that if you have two inputs where either or both inputs has a singleton dimension, or if either of both inputs has only one dimension which has value of 1, each input is replicated in their singleton dimensions to match the size of the other input, and then an element-wise operation is applied to these inputs together to produce your output. In this case, I want to subtract these two newly formed inputs together.

As such, given that S1 is N x D... or technically, this is N x D x 1, and given that S2 is M x D, which I permuted so that it becomes 1 x D x M, we will create a new matrix that is N x D x M long. The first input will duplicate itself as a 3D matrix where each slice is equal to S1 and that is N x D. S2 is now a 3D matrix, but it is represented in such a way where each row in the original matrix is a slice in the 3D matrix where each slice consists of just a single row. This gets duplicated for N rows.

We now apply the @minus operation, and the effect of this is that for each output slice i in this new matrix, this gives you the component wise difference between the point i in S2 with all of the other points in S1. For example, for slice #1, row #1 gives you the component wise differences between point #1 in S2 and point #1 in S1. Row #2 gives you the component wise differences between point #1 in S2 and point #2 S1, and so on.
sum((...).^2, 2): We want to find the Euclidean distance between one point and another point, so we sum these distances squared over each column independently. This results in a new 3D matrix where each slice contains N values where there are N distances for each of the M points. For example, the first slice will give you the distances from point #1 in S2 with all of the other points in S1.
out = reshape(..., size(S1,1), size(S2,1));: We now reshape this so that it becomes a M x N matrix so that each row and column pair of (i,j) gives you the distances between points i in S1 and j in S2, thus completing the calculations.
Doing [dist,ind] = min(out, [], 2); determines the smallest distance between a point in S1 with the other points in S2. dist will give you the smallest distances while ind will tell you which vector it is. Therefore, for each element in dist, it gives you the smallest distance between a point i in S1 and one of S2, and ind tells you which vector that was that belonged to S2.

We can verify that this gives us correct results by using your proposed approach of looping through each pair of points and calculating the norm. Let's create S1 and S2:

S1 = [1 2 3; 4 5 6; 7 8 9; 10 11 12];
S2 = [-1 -1 -1; 0 9 8];

More neatly displayed:

>> S1

S1 =

     1     2     3
     4     5     6
     7     8     9
    10    11    12

>> S2

S2 =

    -1    -1    -1
     0     9     8

Using the loop approach, we have this code:

out = zeros(size(S1,1), size(S2,1));
for ii = 1 : size(S1,1)
    for jj = 1 :size(S2,1) 
        out(ii,jj) = norm(S1(ii,:) - S2(jj,:));
    end
end

We get this matrix:

>> out

out =

    5.3852    8.6603
   10.4881    6.0000
   15.6525    7.1414
   20.8327   10.9545

Similarly, if we ran the code I wrote, we also get:

>> out = reshape(sqrt(sum(bsxfun(@minus, S1, permute(S2, [3 2 1])).^2, 2)), size(S1,1), size(S2,1))

out =

    5.3852    8.6603
   10.4881    6.0000
   15.6525    7.1414
   20.8327   10.9545

To complete the process, let's find the smallest distances and the corresponding vectors:

>> [dist,ind] = min(out, [], 2);
>> dist

dist =

    5.3852
    6.0000
    7.1414
   10.9545

>> ind

ind =

     1
     2
     2
     2

Therefore, for the first vector in S1, the closest vector to this in S2 was first one, with a distance of 5.3852. Similarly, the second vector of S1, the closest vector in S2 was the second one, with a distance of 6. You can repeat this for the other values and see that it's correct.

This doesn't get out of calculating N*M norms, correct? It just gets rid of the for loop? — Matthew Gunn, Nov 11 '15 at 22:01
What I was writing before was all wrong. Let me think about it for a sec if there's a nice way to do it. — Matthew Gunn, Nov 11 '15 at 22:24
@MatthewGunn no problem at all. I'm interested to see how you can do this with matrix multiplication. I began that way, but couldn't see a way to do it, which is why I pulled out `bsxfun`. — rayryeng, Nov 11 '15 at 22:25
Some way of iterating, either for loop or bsxfun is the way to go to get the vector difference for each combination. Sorry for my earlier mindfart. — Matthew Gunn, Nov 11 '15 at 22:35
@MatthewGunn no problem at all. I was actually excited when you came out with your answer because I didn't see how to do it with matrix multiplication... but it was your reasoning that ultimately led me to give up and I had to use `bsxfun`. Either way, it was great to see your attempt. — rayryeng, Nov 11 '15 at 22:38
@Simon I forgot that you wanted to find the closest vectors. I just added that in as an edit. — rayryeng, Nov 11 '15 at 23:00
@Simon so did what I write help you? Consider accepting one of our answers if you no longer need help. Good luck! — rayryeng, Nov 12 '15 at 00:23
@MatthewGunn - I forgot about this post: http://stackoverflow.com/questions/23911670/efficiently-compute-pairwise-squared-euclidean-distance-in-matlab - This is how you'd do it with matrix multiplication, but you need to write a loop to set things up first. It's certainly faster! — rayryeng, Nov 13 '15 at 07:53

score 5 · Answer 2 · edited May 23 '17 at 10:30

5

How about in one line?

[dist, ind] = min(pdist2(S2,S1));

ind is the index of the nearest vector in S2 for each vector in S1, and dist gives the corresponding minimum distance.

Borrowing @rayryeng's example,

S1 = [1 2 3; 4 5 6; 7 8 9; 10 11 12];
S2 = [-1 -1 -1; 0 9 8];

the results are

dist =
   5.385164807134504   6.000000000000000   7.141428428542850  10.954451150103322
ind =
    1     2     2     2

edited May 23 '17 at 10:30

Community

1
1

answered Nov 11 '15 at 23:05

Luis Mendo

110,752
13
76
147

Ghe, is it also faster than the might `bsxfun`? – Adriaan Nov 11 '15 at 23:06
1

@Adriaan _Nothing_ is faster than `bsxfun` :-P Except for [matrix multiplication](http://stackoverflow.com/q/23911670/2586922) – Luis Mendo Nov 11 '15 at 23:10
1

Very nice. I forgot to use `pdist2`, but I liked how I came up with `bsxfun/permute` all by myseslf! – rayryeng Nov 12 '15 at 00:38

Given two sets of vectors, how do I find the closest vector in the second set for each vector in the first set?

2 Answers2