1

I present my simple working Matlab code and will ask questions:

tic

nrand1 = 10000;
nrand2 = 20000;  

% Location matrix 1: [longitude, latitude, w1]
lmat1=[rand(nrand1,1)-75 rand(nrand1,1)+39 round(rand(nrand1,1)*1000)+1];

% Location matrix 2: [longitude, latitude, w2]     
lmat2=[rand(nrand2,1)-75 rand(nrand2,1)+39 round(rand(nrand2,1)*100)+1];

% The number of rows for each matrix = In fact it's nrand1 X nrand2, obviously
nobs1 = size(lmat1,1);
nobs2 = size(lmat2,1);

% The number of pair-wise distances 
% between L1 locations X L2 locations
ndist = nobs1*nobs2;

% Initialization: Distance vector and weight vector
hdist = zeros(ndist,1);
weight = zeros(ndist,1);

% Double for loop -- for calculating the pair-wise distances and weights
k=1;
for i=1:nobs1
 for j=1:nobs2
     % distances in kilometers.
     lonH = sin(0.5*(lmat1(i,1)-lmat2(j,1))*pi/180.0)^2;
     latH = sin(0.5*(lmat1(i,2)-lmat2(j,2))*pi/180.0)^2;
     hdist(k) = 0.001*6372797.560856*2 ...
               *asin(sqrt(latH+(cos(lmat1(i,2)*pi/180.0) ...
               *cos(lmat2(j,2)*pi/180.0))*lonH));
     weight(k) = lmat1(i,3)*lmat2(j,3);
 k=k+1;
 end
end

toc

The code calculates 10000 X 20000 distances and weights.

Elapsed time is 67.124844 seconds. 

Is there a way to vectorize the double-loop processing, or to perform a parallel computing? If there is no room for performance improvement in Matlab, I may have to write the double loops in C and call it from Matlab. I don't know how to call C from matlab, so I will ask a separate question. Thanks!

Bill TP
  • 1,037
  • 2
  • 16
  • 29
  • Pair-wise distances? Look at [`pdist`](http://www.mathworks.com/help/stats/pdist.html) in Matlab, which uses C code under the hood if you use one of the built-in distance metrics. See also [this question](http://stackoverflow.com/questions/17777292/fast-algorithms-for-finding-pairwise-euclidean-distance) and my answer – you may be able to modify either my Matlab code or my mex C code for your purposes. – horchler Dec 05 '14 at 01:31
  • what is `nrandom`? cant execute this code. – Marcin Dec 05 '14 at 01:38
  • I will look through your post. Thank you. The "pdist" options and the custom function specification could differ from my "two" location matrices setting. But I will try it. – Bill TP Dec 05 '14 at 01:43
  • Sorry, Marcin. It should run now. – Bill TP Dec 05 '14 at 01:45
  • Actually, you probably want [`pdist2`](http://www.mathworks.com/help/stats/pdist2.html). – horchler Dec 05 '14 at 02:33

2 Answers2

2

The solution is that your inputs (lmat1 and lmat2) do not need to be matrices like you have them. Each one is really three vectors. Once you've broken out the vectors, you can create arrays that have every permutation of lmat1 and lmat2 together (which is what your double loop is doing). At that point, you can call your math as single, fully-vectorized operations...

%make your vectors
lmat1A = rand(nrand1,1)-75;
lmat1B = rand(nrand1,1)+39;
lmat1C = round(rand(nrand1,1)*1000)+1
lmat2A = rand(nrand2,1)-75;
lmat2B = rand(nrand2,1)+39;
lmat2C = round(rand(nrand2,1)*1000)+1

%make every combination
lmat1A = lmat1A(:)*ones(1,nrand2);
lmat1B = lmat1B(:)*ones(1,nrand2);
lmat1C = lmat1C(:)*ones(1,nrand2);
lmat2A = ones(nrand1,1)*(lmat2A(:)');
lmat2B = ones(nrand1,1)*(lmat2B(:)');
lmat2C = ones(nrand1,1)*(lmat2C(:)');

%do your math
lonH = sin(0.5*(lmat1A-lmat2A)*pi/180.0).^2;
latH = sin(0.5*(lmat1B-lmat2B)*pi/180.0).^2;
hdist = 0.001*6372797.560856*2 ...
    .*asin(sqrt(latH+(cos(lmat1B*pi/180.0) ...
    .*cos(lmat2B*pi/180.0)).*lonH));  %use element-wise multiplication
weight = lmat1C.*lmat2C;

%reshape your output into vectors (not arrays), which is what your original code does
lonH = lonH(:)
latH = latH(:)
hdist = hdist(:);
weight = weight(:);
chipaudette
  • 1,655
  • 10
  • 13
  • I had the following error: Error using * Inner matrix dimensions must agree. Error in test (line 21) lmat2A = ones(nrand1,1)*lmat2A; – Bill TP Dec 05 '14 at 02:41
  • This will work, but it's full of typos and some operators need to be converted to their element-wise versions (e.g., `^2`). It will probably use more memory than other schemes, but if optimized further it may be fastest for smaller datasets (around `100-by-100` maybe, but I'm guessing). – horchler Dec 05 '14 at 03:08
  • Sorry about the typos and thanks for catching the missing element-wise `.^`. I fixed several of the issues. There are probably more still in there (I don't have Matlab here where I am). – chipaudette Dec 05 '14 at 03:23
2

Using bsxfun, you can eliminate the for loops and the need for calculating matrices for each combination (this should reduce memory usage). The following is about six times faster than your original code on my computer using R2014b:

nrand1 = 10000;
nrand2 = 20000;

% Location matrix 1: [longitude, latitude, w1]
lmat1=[rand(nrand1,1)-75 rand(nrand1,1)+39 round(rand(nrand1,1)*1000)+1];

% Location matrix 2: [longitude, latitude, w2]     
lmat2=[rand(nrand2,1)-75 rand(nrand2,1)+39 round(rand(nrand2,1)*100)+1];

p180 = pi/180;
lonH = sin(0.5*bsxfun(@minus,lmat1(:,1).',lmat2(:,1))*p180).^2;
latH = sin(0.5*bsxfun(@minus,lmat1(:,2).',lmat2(:,2))*p180).^2;
hdist = 0.001*6372797.560856*2*asin(sqrt(latH+bsxfun(@times,cos(lmat1(:,2).'*p180),cos(lmat2(:,2)*p180)).*lonH));
hdist1 = hdist(:);
weight1 = bsxfun(@times,lmat1(:,3).',lmat2(:,3));
weight1 = weight1(:);

Note that by using the variable p180, the math is changed slightly so you won't get precisely the same values, but they will be very close.

horchler
  • 18,384
  • 4
  • 37
  • 73