2

The essential part of the code in question can be distilled into:

list=rand(1,x); % where x is some arbitrarily large integer
hitlist=[];
for n=1:1:x
    if rand(1) < list(n)
        hitlist=[hitlist n];
    end
end
list(hitlist)=[];

This program is running quite slowly and I suspect this is why, however I'm unaware how to fix it. The length of the hitlist will necessarily vary in a random way, so I can't simply preallocate a 'zeros' of the proper size. I contemplated making the hitlist a zeros the length of my list, but then I would have to remove all the superfluous zeros, and I don't know how to do that without having the same problem.

How can I preallocate an array of random size?

Adriaan
  • 17,741
  • 7
  • 42
  • 75
Gage
  • 21
  • 4

1 Answers1

3

I'm unsure about preallocating 'random size', but you can preallocate in large chunks, e.g. 1e3, or however is useful for your use case:

list=rand(1,x); % where x is some arbitrarily large integer
a = 1e3; % Increment of preallocation
hitlist=zeros(1,a);
k=1; % counter
for n=1:1:x
    if rand(1) < list(n)
        hitlist(k) = n;
        k=k+1;
    end
    if mod(k-1,a)==0 % if a has been reached
        hitlist = [hitlist zeros(1,a)]; % extend
    end
end
hitlist = hitlist(1:k-1); % trim excess
% hitlist(k:end) = []; % alternative trim, but might error
list(hitlist)=[];

This won't be the fastest possible, but at least a whole lot faster than incrementing each iteration. Make sure to choose a suitable; you can even base it somehow on the available amount of RAM using memory, and trim the excess afterwards, that way you don't have to do the in-loop trick at all.


As an aside: MATLAB works column-major, so running through matrices that way is faster. I.e. first the first column, then the second and so on. For a 1D array this doesn't matter, but for matrices it does. Hence I prefer to use list = rand(x,1), i.e. as column.

For this specific case, don't use this looped approach anyway, but use logical indexing:

list = rand(x,1);
list = list(list<rand(size(list)));
Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • 2
    I don’t believe that incrementing the array in chunks is the way to go. All you do is potentially slow down the JIT. When you do `hitlist(end+1)=n`, MATLAB will double the memory assigned to the array (array size ~= allocates size). Doing this chunkwise will not let MATLAB optimize this use case. The problem is doing `hitlist = [hitlist n]`, as there you don’t extend the array but create a new one. I have a Q&A link somewhere that demonstrates this, I’ll go look for it. – Cris Luengo Oct 14 '18 at 18:55
  • 1
    [Here it is](https://stackoverflow.com/q/48351041/7328782). Read both the question and my answer to see the full experiment. – Cris Luengo Oct 14 '18 at 18:57
  • But +1 anyway for the logical indexing recommendation. That it the right approach. – Cris Luengo Oct 14 '18 at 18:59
  • 1
    I'll publicly eat my words here. Your method to increment the array in steps of 1e3 is slightly faster than incrementing the array by `hitlist(end+1)=n`. Either method is an order of magnitude faster than OP's code. Of course, the version at the end, without a loop, is again two orders of magnitude faster than either of the two fast ways of incrementing the array size. – Cris Luengo Oct 15 '18 at 13:17