How to reduce the time of computation in presence of for loops and matrix-vector multiplication

Question

I am writing a program in which the time of computation is really important so I have to write my codes in a way to reduce the time. In the following, I wrote a code but it will be time consuming if the length of my vectors goes high. Is there anyway to produce the same result in a faster way?

K1 = [1 2 3 4 5];  K2 = [6 7 8 9 10];
kt1 = [1.5 3 4.5]; kt2 = [6.5 8 9.5];
numk1 = bsxfun(@minus,K1.',kt1);
denomk1 = bsxfun(@minus, kt1.',kt1)+eye(numel(kt1));
numk2 = bsxfun(@minus,K2.',kt2);
denomk2 = bsxfun(@minus, kt2.', kt2)+eye(numel(kt2));

for j=1:numel(kt1)
    for jj=1:numel(kt2)
        k1_dir = bsxfun(@rdivide,numk1,denomk1(j,:)); k1_dir(:,j)=[];
        k_dir1 = prod(k1_dir,2);
        k2_dir = bsxfun(@rdivide,numk2,denomk2(jj,:)); k2_dir(:,jj)=[];
        k_dir2 = prod(k2_dir,2);
        k1_k2(:,:,j,jj) = k_dir1 * k_dir2';
    end
end

In the above code, as the length of K1and K2increase, the length of kt1and kt2 increase too. So for long vector lengths this code is time consuming.

It seems like you already have an answer, but in the future please be specific when it comes to performance questions. 'Slow when large' is not very relevant, '1 sec when my input is of size (9,1,100)' helps people know what you need. — Dennis Jaheruddin, Apr 14 '15 at 15:40

Divakar · Accepted Answer · 2015-04-14T15:19:46.217

Going full-throttle on vectorization, this could be one approach to replace the loopy portion of the code listed in the problem -

%// Size parameters
M1 = numel(K1);
N1 = numel(kt1);
M2 = numel(K2);
N2 = numel(kt2);

%// Indices to be removed from k1_dir & k2_dir. 
%// In our vectorized version, we will just use these to set 
%// corresponding elements in vectorized versions of k1_dir & k2_dir
%// to ONES, as later on PROD would take care of it.
rm_idx1 = bsxfun(@plus,[1:M1]',[0:N1-1]*(M1*N1+M1)); %//'
rm_idx2 = bsxfun(@plus,[1:M2]',[0:N2-1]*(M2*N2+M2)); %//'

%// Get vectorized version of k1_dir, as k1_dirv
k1_dirv = bsxfun(@rdivide,numk1,permute(denomk1,[3 2 1]));
k1_dirv(rm_idx1) = 1;
k_dir1v = prod(k1_dirv,2);

%// Get vectorized version of k2_dir, as k2_dirv
k2_dirv = bsxfun(@rdivide,numk2,permute(denomk2,[3 2 1]))
k2_dirv(rm_idx2) = 1;
k_dir2v = prod(k2_dirv,2);

%// Get vectorized version of k1_k2, as k1_k2v
k1_k2v = bsxfun(@times,k_dir1v,permute(k_dir2v,[2 1 4 3]));

Quick runtime test:

With the inputs setup like so -

SZ1 = 100;
SZ2 = 100;

K1 = randi(9,1,SZ1);
K2 = randi(9,1,SZ1);

kt1 = randi(9,1,SZ2);
kt2 = randi(9,1,SZ2);

The runtimes for the loopy portion in the original (after adding code for pre-allocation with zeros for a more fair benchmarking) and proposed vectorized approach were -

-------------------------- With Original Loopy Approach
Elapsed time is 1.086666 seconds.
-------------------------- With Proposed Vectorized Approach
Elapsed time is 0.178805 seconds.

Doesn't seem like JIT is showing its magic, at least not when bsxfun is used inside nested loops and also the fact that you need to index into that huge 4D array in each iteration isn't helping you. So, going full-throttle on vectorization in cases like these make more sense !

Can it be modified to work when the dimension of `K1`and `K2` is different? — shaloo shaloo, Apr 14 '15 at 15:07
@SanthanSalai How about [6 dimensions](http://stackoverflow.com/a/26187897/3293881)? :) — Divakar, Apr 15 '15 at 09:01
@SanthanSalai Well the solutions where I have used `permute` can found with [this search](http://stackoverflow.com/search?tab=votes&q=user%3a3293881%20permute). — Divakar, Apr 15 '15 at 10:04

How to reduce the time of computation in presence of for loops and matrix-vector multiplication

1 Answers1