3

In each iteration of a loop, I am calculating a MATLAB matrix. These matrices all must be concatenated together to create one final matrix. I know the dimensions of this final matrix before entering the loop, so I though preallocating the matrix using the 'zeros' function would be faster than initializing an empty array then simply appending the subarrays in each iteration of my loop. Oddly, my program runs MUCH slower when I preallocate. Here is the code (Only the first and last lines differ):

This is slow:

w_cuda = zeros(w_rows, w_cols, f_cols);

for j=0:num_groups-1

    % gets # of rows & cols in W. The last group is a special
    % case because it may have fewer than max_row_size rows
    if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0)
        num_rows_sub = w_rows - (max_row_size * j);    
    else
        num_rows_sub = max_row_size;
    end;

    % calculate correct W and f matrices
    start_index = (max_row_size * j) + 1;
    end_index = start_index + num_rows_sub - 1;

    w_sub = W(start_index:end_index,:);
    f_sub = filterBank(start_index:end_index,:);

    % Obtain sub-matrix
    w_cuda_sub = nopack_cu(w_sub,f_sub);

    % Incorporate sub-matrix into final matrix
    w_cuda(start_index:end_index,:,:) = w_cuda_sub;

end

This is fast:

w_cuda = [];

for j=0:num_groups-1

    % gets # of rows & cols in W. The last group is a special
    % case because it may have fewer than max_row_size rows
    if (j == num_groups-1 && mod(w_rows, max_row_size) ~= 0)
        num_rows_sub = w_rows - (max_row_size * j);    
    else
        num_rows_sub = max_row_size;
    end;

    % calculate correct W and f matrices
    start_index = (max_row_size * j) + 1;
    end_index = start_index + num_rows_sub - 1;

    w_sub = W(start_index:end_index,:);
    f_sub = filterBank(start_index:end_index,:);

    % Obtain sub-matrix
    w_cuda_sub = nopack_cu(w_sub,f_sub);

    % Incorporate sub-matrix into final matrix
    w_cuda = [w_cuda; w_cuda_sub];

end

As far as other potentially useful information--my matrix is 3D, and the numbers within it are complex. As always, any insight is appreciated.

nedblorf
  • 5,135
  • 4
  • 27
  • 28
  • Part of your code seems to be missing. The matrix or function "W" and "filterBank" are not defined. – Miebster Mar 02 '11 at 17:52
  • Yes, this is only the part of my code I thought was relevant. Thanks. – nedblorf Mar 02 '11 at 18:08
  • Without knowing what your code is doing, or at least some executable analogy to what your code is doing, how can anyone tell you why it is slow? From your code it is not even clear if W, filterBank, and nopack_cu are matricies or functions. It appears that w_rows, w_cols, f_cols, num_groups, max_row_size relate to each other in some way, but it is not clear. If you can provide an executable example I can look at it further. I suspect that on my machine the top example will execute faster than the bottom example. – Miebster Mar 02 '11 at 18:27
  • Looking at your variable names - are you using CUDA or another parallel extension? – Xodarap Mar 02 '11 at 18:49
  • Yes, Xodarap; I am using MEX to make a CUDA call which subsequently returns the result to w_cuda_sub. The time for this step is the same between the two code samples I provided. – nedblorf Mar 07 '11 at 16:43
  • There is an interesting post here: http://blogs.mathworks.com/steve/2011/05/20/more-about-automatic-array-growth-improvements-in-matlab-r2011a/ – Alessandro Cuttin Mar 24 '16 at 14:41

1 Answers1

8

I have always assumed preallocation is faster for any array size and never actually tested it. So, I did a simple test timing the population of various array sizes from 1x1x3 up to 20x20x3 using 1000 iterations by both appending and preallocation methods. Here's the code:

arraySize = 1:20;
numIteration = 1000;

timeAppend = zeros(length(arraySize), 1);
timePreAllocate = zeros(length(arraySize), 1);

for ii = 1:length(arraySize); 
    w = [];
    tic;
    for jj = 1:numIteration
        w = [w; rand(arraySize(ii), arraySize(ii), 3)];
    end
    timeAppend(ii) = toc;
end; 

for ii = 1:length(arraySize); 
    w = zeros(arraySize(ii) * numIteration, arraySize(ii), 3);
    tic;
    for jj = 1:numIteration
        indexStart = (jj - 1) * arraySize(ii) + 1;
        indexStop = indexStart + arraySize(ii) - 1;
        w(indexStart:indexStop,:,:) = rand(arraySize(ii), arraySize(ii), 3);
    end
    timePreAllocate(ii) = toc;
end; 

figure;
axes;
plot(timeAppend);
hold on;
plot(timePreAllocate, 'r');
legend('Append', 'Preallocate');

And here are the (as expected) results: Comparison of array appending vs. preallocation

b3.
  • 7,094
  • 2
  • 33
  • 48
  • Thank you for this excellent answer. It turns out that I was not running enough iterations to see the performance benefit of preallocating. Still, my program runs slower for 1-5 iterations when I preallocate, which unfortunately is a common use case. – nedblorf Mar 07 '11 at 16:39
  • 1
    I suppose you could add a conditional statement around your preallocation to only do it above a certain threshold of iterations. – Eric Mar 15 '11 at 14:36