3

In my program in matlab I have several instances where I need to create a matrix, which entries depends on its indices and perform matrix-vector operations with it. I wonder how I can implement this most efficiently.

For example, I need to speed up:

N = 1e4;
x = rand(N,1);

% Option 1
tic
I = 1:N;
J = 1:N;
S = zeros(N,N);
for i = 1:N
    for j = 1:N
        S(i,j) = (i+j)/(abs(i-j)+1);
    end
end
a = x'*S*x
fprintf('Option 1 takes %.4f sec\n',toc)
clearvars -except x N

I try to speed this up, so I have tried the following options:

% Option 2
tic
I = 1:N;
J = 1:N;
Sx = zeros(N,1);
for i = 1:N
    Srow_i = (i+J)./(abs(i-J)+1);
    Sx(i)= Srow_i*x;
end
a = x'*Sx
fprintf('Option 2 takes %.4f sec\n',toc)
clearvars -except x N

and

% Option 3
tic
I = 1:N;
J = 1:N;
S = bsxfun(@plus,I',J)./(abs(bsxfun(@minus,I',J))+1);
a = x'*S*x
fprintf('Option 3 takes %.4f sec\n',toc)
clearvars -except x N

and (thanks to one of the answers)

% options 4
tic
[I , J] = meshgrid(1:N,1:N);
S = (I+J) ./ (abs(I-J) + 1);
a = x' * S * x;
fprintf('Option 4 takes %.4f sec\n',toc)
clearvars -except x N

Otion 2 is the most efficient. Is there a faster option of doing this operation?

Update:

I have also tried the option by Abhinav:

% Option 5 using Tony's Trick
tic
i = 1:N;
j = (1:N)';
I = i(ones(N,1),:);
J = j(:,ones(N,1));
S = (I+J)./(abs(I-J)+1);
a = x'*S*x;
fprintf('Option 5 takes %.4f sec\n',toc)
clearvars -except x N

It seems that the most efficient procedure depends on the size of N. For different N I get the following output:

N = 100:

Option 1 takes 0.00233 sec
Option 2 takes 0.00276 sec
Option 3 takes 0.00183 sec
Option 4 takes 0.00145 sec
Option 5 takes 0.00185 sec

N = 10000:

Option 1 takes 3.29824 sec
Option 2 takes 0.41597 sec
Option 3 takes 0.72224 sec
Option 4 takes 1.23450 sec
Option 5 takes 1.27717 sec

So, for small N, option 2 is the slowest but it becomes the most efficient for larger N. Maybe because of memory? Could somebody explain this?

Optimist
  • 131
  • 4

2 Answers2

2

You can create indices using meshgrid and no need to loop:

N = 1e4;
[I , J] = meshgrid(1:N,1:N);
x = rand(N,1);
S = (I+J) ./ (abs(I-J) + 1);
a = x' * S * x;

Update:

Since @Optimist shown that performance of this code is less than Option2 and Option3 I decided to slightly improve Option2:

N = 1e4;
x = rand(N,1);
Sx = zeros(N,1);
for i = 1:N
    Srow_i = (i+1:i+N)./[i:-1:2,1:N-i+1] ;
    Sx(i)= Srow_i*x;
end
a = x'*Sx;
rahnema1
  • 15,264
  • 3
  • 15
  • 27
  • 1
    Thanks for your answer. This avoids indeed a for loop, but is unfortunately less efficient than both option 2 and 3. – Optimist Aug 31 '16 at 10:21
  • @Optimist usually I think that loop leads to less efficient code, but your example shown that if loop used correctly can lead to increased efficiency. I updated my answer and slightly modify your option2. – rahnema1 Aug 31 '16 at 15:28
1

You should try using the Tony's trick to do vector stacking/tiling in Matlab the fastest way. I have answered a similar question here. Here is the Tony's Trick option.

% Option using Tony's Trick
tic
i = 1:N;
j = (1:N)';
I = i(ones(N,1),:);
J = j(:,ones(N,1));
S = (I+J)./(abs(I-J)+1);

a = x'*S*x
fprintf('Option 1 takes %.4f sec\n',toc)

Edit 1: I ran a few tests and found the following. Up to N=1000, the Tony's trick option is slightly faster than the Option 2. Beyond that, Option 2 again catches up and becomes faster.

Possible Reason : This should be so because, up until the size of the array could fit in the cache, the fully vectorized code (Tony's Trick option) is faster BUT as soon as the array sizes grow (N>1000), it spills into memory caches away from the CPU and then Matlab uses some internal optimization to breakdown the Tony's Trick code into piecemeal code so that it no-longer enjoys the benefit of complete vectorization.

Community
  • 1
  • 1
Abhinav
  • 1,882
  • 1
  • 18
  • 34
  • Thanks for the explanation. I have tried your option and indeed it performs faster than option 2 for smaller N (e.g. N=200). In that case option 4 turns out to be the fastest. For larger N, option 2 outperforms the rest. Your explanation sounds reasonable. – Optimist Aug 31 '16 at 16:24