0

I'm writing a Matlab code that needs to calculate distances of vectors and I execute

X = norm(A(:,i)-B(:,j));
%do something with X
%loop over i and j

quite often. It is a relatively small computation so it is not really suitable for parfor, so I thought the best idea would be to implement it with the gpu functions.

I found that pagefun and arrayfun do something like what I want, but they execute element-wise operations and not on vectors.

So my question is, is there a more clever way of calculating norms without for loops? Or if I actually need to use gpu, what is the best way to do it?

user27221
  • 334
  • 3
  • 16
  • a norm of something is 4 matematical operations, it shoudl be fast. If they are A LOT try `gpuarray` as you mentioned. – Ander Biguri Mar 22 '16 at 15:03
  • 1
    Please put example code with the loop into your question, my feeling is that `pdist2` or `bsxfun` solves this very fast on a CPU if vectorize it. – Daniel Mar 22 '16 at 15:11
  • I execute this millions of times, so it becomes slow. The "code" is basically some simple operations trying to find which point in high-dimensional space is closer to the other. According to profile viewer this is the slowest part – user27221 Mar 22 '16 at 15:24
  • pdist2(A',B','euclidean','Smallest',k) is the fastest solution so far (on cpu), but it doesnt run on gpu. Any suggestions? Also I am testing this on a toy dataset which is much smaller than my real one, so I am not sure if it is really faster. – user27221 Mar 22 '16 at 16:05
  • 1
    See [`this solution : Approach #3`](http://stackoverflow.com/a/31308903/3293881). It's based on matrix-mul, which must be very fast on a GPU (memory permitting). – Divakar Mar 22 '16 at 16:37

1 Answers1

0

If you need the norm between all the elements of A and B, the fastest way is probably this:

N = 1000; % number of elements
dim = 3; % number of dimensions
A = rand(dim,N, 'gpuArray');
B = rand(dim,1,N, 'gpuArray');
C = sqrt(squeeze(sum(bsxfun(@minus, A, B).^2))); % C(i,j) norm

I get the execution time of 0.16 seconds on CPU and 0.13 seconds on the GPU.