Optimizing the value N to split arrays up for vectorizing an array so it runs the quickest

Question

I'm trying to optimizing the value N to split arrays up for vectorizing an array so it runs the quickest on different machines. I have some test code below

#example use random values
clear all,
t=rand(1,556790);
inner_freq=rand(8193,6);

N=100; # use N chunks
nn = int32(linspace(1, length(t)+1, N+1))
aa_sig_combined=zeros(size(t));
total_time_so_far=0;
for ii=1:N
    tic;
    ind = nn(ii):nn(ii+1)-1;
    aa_sig_combined(ind) = sum(diag(inner_freq(1:end-1,2)) * cos(2 .* pi .* inner_freq(1:end-1,1) * t(ind)) .+ repmat(inner_freq(1:end-1,3),[1 length(ind)]));
    toc
    total_time_so_far=total_time_so_far+sum(toc)
end
fprintf('- Complete  test in %4.4fsec or %4.4fmins\n',total_time_so_far,total_time_so_far/60);

This takes 162.7963sec or 2.7133mins to complete when N=100 on a 16gig i7 machine running ubuntu

Is there a way to find out what value N should be to get this to run the fastest on different machines?

PS: I'm running Octave 3.8.1 on 16gig i7 ubuntu 14.04 but it will also be running on even a 1 gig raspberry pi 2.

If you are so concerned with performance, I'd write the loop in Fortran or C and then parallelize it (perhaps with OpenMP). You might also try precomputing some things outside of your main loop, such as length(ind) or inner_freq(1:end-1,1). — siliconwafer, Apr 17 '15 at 17:22
@krisdestruction I would love to but your code doesn't seem to work... I included the error's in the section under your answer. I tried it multiple places online along with http://octave-online.net/ and errors with 'A' along with error: operator *: nonconformant arguments (op1 is 8192x8192, op2 is 1x5568) — Rick T, Jun 17 '15 at 16:31

score 2 · Accepted Answer · answered Apr 17 '15 at 22:07

This is the Matlab test script that I used to time each parameter. The return is used to break it after the first iteration as it looks like the rest of the iterations are similar.

%example use random values
clear all;
t=rand(1,556790);
inner_freq=rand(8193,6);

N=100; % use N chunks
nn = int32( linspace(1, length(t)+1, N+1) );
aa_sig_combined=zeros(size(t));

D = diag(inner_freq(1:end-1,2));
for ii=1:N
    ind = nn(ii):nn(ii+1)-1;
    tic;
    cosPara = 2 * pi * A * t(ind);
    toc;
    cosResult = cos( cosPara );
    sumParaA = D * cosResult;
    toc;
    sumParaB = repmat(inner_freq(1:end-1,3),[1 length(ind)]);
    toc;
    aa_sig_combined(ind) = sum( sumParaA + sumParaB );
    toc;
    return;
end

The output is indicated as follows. Note that I have a slow computer.

Elapsed time is 0.156621 seconds.
Elapsed time is 17.384735 seconds.
Elapsed time is 17.922553 seconds.
Elapsed time is 18.452994 seconds.

As you can see, the cos operation is what's taking so long. You are running cos on a 8192x5568 matrix (45,613,056 elements) which makes sense that it takes so long.

If you wish to improve performance, use parfor as it appears each iteration is independent. Assuming you had 100 cores to run your 100 iterations, your script would be done in 17 seconds + parfor overhead.

Within the cos calculation, you might want to look into if another method exists to calculate cos of a value faster and more parallel than the stock method.

Another minor optimization is this line. It ensures that the diag function isn't called within the loop as the diagonal matrix is constant. You don't want a 8192x8192 diagonal matrix to be generated every time! I just stored it outside the loop and it gives a bit of a performance boost as well.

D = diag(inner_freq(1:end-1,2));

Note that I didn't use the Matlab profile as it didn't work for me, but you should use that in the future for more functionalized code.

I keep getting an error where 'A' is not defined and an error: operator *: nonconformant arguments (op1 is 8192x8192, op2 is 1x5568) error: called from: error: /home/rt/Documents/octave/eq_research/main/transform/test_loop_speed.m at line 18, column 14 — Rick T, Jun 17 '15 at 16:24
Also I'm using Octave 3.8.1 which parfor isn't fully supported yet http://stackoverflow.com/questions/24970519/how-to-use-parallel-for-loop-in-octave-or-scilab — Rick T, Jun 17 '15 at 16:27

Optimizing the value N to split arrays up for vectorizing an array so it runs the quickest

1 Answers1

Linked