I'm trying to do what I think has got to be the most basic GPU operation imaginable, in Matlab, and I can't seem to figure it out.
I have a list of a million 3D coordinates and I want to find their distance from a single other point. The core is a simple square root of sum of squares function "getDist(a,b)" where a and b are 3x1 vectors.
Without GPU I could brute force like this:
for x = 1:1e6 d(x) = getDist(points(:,x),point) end
If I didn't mind wasting memory I could use repmat to repeat my single point over and over, and I think that would work for GPU:
pointsGPU = gpuArray(points); pointGPS = gpuArray(repmat(point,1,1e6));
d = sqrt(sum((pointsGPU - pointGPU).^2)));
This is slower than on the CPU. I narrowed it down to the problem that .^2 is slow on the GPU for some reason. Basically, I broke the line above into individual parts (the subtraction, square, sum and sqrt. All were almost 10x faster on GPU, but .^2 was slower. Multiplying the array by itself with .* took the same time on CPU and GPU.
I tried using arrayfun:
d = arrayfun(@(x) getDist(points(:,x),point), 1:1e6)
but this gives an error with the GPU: "gpuArray output type is not currently implemented". I take this to mean that it wants the GPU array to be where the 1:1e6 is, not embedded in the function. I can't figure out how to formulate it that way since I want to take columns at a time, not single elements.
This HAS to be solvable. What's a GPU for if not to solve massively parallel geometry problems... Thanks for any help.