I'm trying to optimizing the value N to split arrays up for vectorizing an array so it runs the quickest on different machines. I have some test code below
#example use random values
clear all,
t=rand(1,556790);
inner_freq=rand(8193,6);
N=100; # use N chunks
nn = int32(linspace(1, length(t)+1, N+1))
aa_sig_combined=zeros(size(t));
total_time_so_far=0;
for ii=1:N
tic;
ind = nn(ii):nn(ii+1)-1;
aa_sig_combined(ind) = sum(diag(inner_freq(1:end-1,2)) * cos(2 .* pi .* inner_freq(1:end-1,1) * t(ind)) .+ repmat(inner_freq(1:end-1,3),[1 length(ind)]));
toc
total_time_so_far=total_time_so_far+sum(toc)
end
fprintf('- Complete test in %4.4fsec or %4.4fmins\n',total_time_so_far,total_time_so_far/60);
This takes 162.7963sec or 2.7133mins to complete when N=100 on a 16gig i7 machine running ubuntu
Is there a way to find out what value N should be to get this to run the fastest on different machines?
PS: I'm running Octave 3.8.1 on 16gig i7 ubuntu 14.04 but it will also be running on even a 1 gig raspberry pi 2.