I have a trivially scalable problem I'm trying to do in MATLAB on a machine with 40 cores and lots of memory. After about 10 cores I see zero decrease in computation time, sometimes the computation time even increases.
While investigating this, I created a simple benchmarking code that seems to illustrate the problem:
clc
clear
p = gcp;
poolSize = p.NumWorkers;
M = 60;
N = 500;
t1 = zeros(1, poolSize);
for n = 1:poolSize
tic;
parfor(ImageInd = 1:n*M,n)
vec1 = rand(1,N);
vec2 = rand(N,1);
d = sin(vec2*vec1);
end
t1(n) = toc;
end
figure; plot(1:poolSize,(1:poolSize)*M./t1,'b')
hold on; plot(1:poolSize,(1:poolSize)*M./t1(1),'b--')
t2 = zeros(1, poolSize);
for n = 1:poolSize
tic;
parfor(ImageInd = 1:n*M,n)
vec1 = rand(1,N);
vec2 = rand(N,1);
d = sin(sin(vec2*vec1));
end
t2(n) = toc;
end
figure; plot(1:poolSize,(1:poolSize)*M./t2,'g')
hold on
plot(1:poolSize,(1:poolSize)*M./t2(1),'g--')
The two iterations of the parfor loop in the script above seem to involve the same amount of memory, but differ in the amount of computation in each loop (the sine command should mean an order of magnitude more computation, if I'm not mistaken). The second loop, the more computationally expensive, scales very well with more processors, while the first does not. This seems to be consistent whether I use temporary variables, like in the example, reduced variables or sliced access variables. Am I correct in assuming that this is a memory issue? Could the problem be fixed by switching to another programming language or improved computer architecture? The variables involved should be smaller than the cache of the processors, does MATLAB not utilize the cache well during parallel processing?