I have a question. I am trying to compute pairwise distances between vectors. Let me first explain the problem: I have two sets of vectors X
and Y
. X
has three vectors x1
, x2
and x3
. Y
has three vectors y1
, y2
and y3
. Note vectors in X
and Y
are of length m
and n
respectively. Let the dataset be represented as this image:
I am trying to compute a similarity matrix such as this:
. Now the different colour coded parts are explained - All those cells marked with
0
need not be computed. I have intentionally put it as 100
(it can be any value). The grey cells have to be computed. The similarity score is computed as the L2
norm of (xi-xj)
+ L2
norm of (yi-yj)
.
Which means the entries are
M((x_i,y_j), (x_k,y_l)) := norm(x_i-x_k,2) + norm(y_j-y_l,2)
I have written a basic code to do this:
clc;clear all;close all;
%% randomly generate data
m=3; n1=4; n2=6;
train_a_mean = rand(m,n1);
train_b_mean = rand(m,n2);
p = size(train_a_mean,1)*size(train_b_mean,1);
score_mean_ab = zeros(p,p);
%% This is to store the index variables
%% This is required for futu
idx1 = score_mean_ab;
idx2 = idx1; idx3 = idx1; idx4 = idx1;
a=1; b=1;
for i=1:size(score_mean_ab,1)
c = 1; d = 1;
for j=1:size(score_mean_ab,2)
if (a==c)
score_mean_ab(i,j) = 100;
else
%% computing distances between the different modalities and
%% summing them up
score_mean_ab(i,j) = norm(train_a_mean(a,:)-train_a_mean(c,:),2) ...
+ norm(train_b_mean(b,:)-train_b_mean(d,:),2);
end
%% saving the indices
idx1(i,j)=a; idx2(i,j)=b; idx3(i,j)=c; idx4(i,j)=d;
%% updating the values of c and d
if mod(d,size(train_a_mean,1))==0
c = c + 1;
d = 1;
else
d = d+1;
end
end
%% updating the values of a and b
if mod(b,size(train_a_mean,1))==0
a = a + 1;
b = 1;
else
b = b+1;
end
end
For a dry sample run of the matrix: I get these results -
score_mean_ab =
100.0000 100.0000 100.0000 0.6700 1.6548 1.5725 0.8154 1.8002 1.7179
100.0000 100.0000 100.0000 1.6548 0.6700 1.5000 1.8002 0.8154 1.6454
100.0000 100.0000 100.0000 1.5725 1.5000 0.6700 1.7179 1.6454 0.8154
0.6700 1.6548 1.5725 100.0000 100.0000 100.0000 1.3174 2.3022 2.2200
1.6548 0.6700 1.5000 100.0000 100.0000 100.0000 2.3022 1.3174 2.1475
1.5725 1.5000 0.6700 100.0000 100.0000 100.0000 2.2200 2.1475 1.3174
0.8154 1.8002 1.7179 1.3174 2.3022 2.2200 100.0000 100.0000 100.0000
1.8002 0.8154 1.6454 2.3022 1.3174 2.1475 100.0000 100.0000 100.0000
1.7179 1.6454 0.8154 2.2200 2.1475 1.3174 100.0000 100.0000 100.0000
However my code is very slow. I took a very few sample runs and got these results:
m=3; n1=3; n2=3;
Elapsed time is 0.000363 seconds.
m=10; n1=3; n2=3;
Elapsed time is 0.042015 seconds.
m=10; n1=1800; n2=1800;
Elapsed time is 0.230046 seconds.
m=20; n1=1800; n2=1800;
Elapsed time is 4.309134 seconds.
m=30; n1=1800; n2=1800;
Elapsed time is 23.058106 seconds.
My Questions :
- Typically I will have values of
m~100
andn1~2000
andn2~2000
. My own code breaks down at this point. Is there any optimised way to do this ? - Can the inbuilt matlab function pdist2 be used for this purpose?
NOTE: The vectors are actually in the form of row vectors and the value of n1
and n2
may not be equal.