2

I am quite new to MATLAB matrix calculation and don't know how to improve the performance when I have many for-loops. I have tried to figure it out by looking through similar questions, but I still feel puzzled.

Here is my code:

pred = zeros(n_test,1) % vector saving all prediction results
for d = 1:n_test  % n_test test data
    u = user(d); 
    p_locations = zeros(n_location,1); 
    for i = 1:n_location
        for z = 1:n_topic
            for r = 1:n_region
                p_locations(i) = p_locations(i)+para.Pzu(z,u)*para.Pru(r,u)*para.Piz(i,z)*para.Pir(i,r);  % calculate the probability of the location i      
            end
        end
    end
    [val, pos] = max(p_locations); % find the location of the largest probability
    pred(d) = pos;
end

As commented, basically, I want to predict location for each test data. Then I will compare with ground truth.

I have almost 80000 test data and the calculation is really slow. It's already 13 hours and it's still running. So could you teach me how to re-write the code? Thanks a lot!

Divakar
  • 218,885
  • 19
  • 262
  • 358
gladys0313
  • 2,569
  • 6
  • 27
  • 51
  • To avoid loops you always have to vectorize you solution. Matlab is much slower on loops than vector operations. Can you provide more detail, such as what is para.Pzu and para.Pru and etc? What do they do, can you make them accept vectors? – jfish003 Jun 08 '16 at 08:08
  • what are the dimensions of para.Pzu, para.pru, para.piz and para.pir? – ibezito Jun 08 '16 at 08:08
  • have a look at Matlab's tips here: http://nl.mathworks.com/help/matlab/matlab_prog/vectorization.html – Swier Jun 08 '16 at 08:14
  • @jfish003 para.Pzu and similar things are matrices, dimension of `para.Pzu` is n_topic * n_user, `para.Pru` is n_region * n_user, `para.Piz` is n_location * n_topic, `para.Pir` is n_location * n_region. They are parameters after training and I need to find right index of entry to calculate the probability – gladys0313 Jun 08 '16 at 08:24
  • @drorco please refer to my answer to jfish003 – gladys0313 Jun 08 '16 at 08:25

2 Answers2

3

Performing broadcasted operations in parts using bsxfun and the efficient matrix-multiplication, here's a fully vectorized approach -

p1 = bsxfun(@times,Pzu(:,user).',permute(Pru(:,user),[2,3,1]));
p2 = bsxfun(@times,Piz,permute(Pir,[1,3,2]));

[~,m,n] = size(p1);
sums = reshape(p2,[],m*n)*(reshape(p1,[],m*n).');
[~, pred_out] = max(sums,[],1);

Benchmarking

Benchmarking code -

% Setup inputs
n_topic = 70;
n_test = 70;
n_region = 70;
n_location = 70;
user = randi(n_test,n_test,1);

Pzu = rand(n_topic,n_test);
Pru = rand(n_region,n_test);
Piz = rand(n_location,n_topic);
Pir = rand(n_location,n_region);

disp('----------- With original loopy approach')
tic
% ... Original code
toc

disp('----------- With vectorized approach')
tic
% ... Proposed code
toc

max_error = max(abs(pred(:)-pred_out(:)))

Output -

----------- With original loopy approach
Elapsed time is 1.157094 seconds.
----------- With vectorized approach
Elapsed time is 0.016201 seconds.
max_error =
     0
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

How about you write the associated matrix operations for this problem. Then you compute the probability of each location by using matrix operations directly not element wise operations. Matlab is much faster to use matrix operations as they use your CPU's vector instructions. Relevant to your use case, bsxfun and potentially sparse matrices, if you can't find a nice dense matrix formulation.

Antoine
  • 11
  • 1