Vectorizing code - How to reduce MATLAB computational time

Question

I have this piece of code

N=10^4;
for i = 1:N
    [E,X,T] = fffun(); % Stochastic simulation. Returns every time three different vectors (whose length is 10^3).
    X_(i,:)=X;
    T_(i,:)=T;
    GRID=[GRID T];
end
GRID=unique(GRID);
% Second part
for i=1:N
for j=1:(kmax)
    f=find(GRID==T_(i,j) | GRID==T_(i,j+1));
    s=f(1);
    e=f(2)-1;

 counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;
end
end

The code performs N different simulations of a stochastic process (which consists of 10^3 events, occurring at discrete moments (T vector) that depends on the specific simulation. Now (second part) I want to know, as a function of time istant, how many simulations are in a particular state (X assumes value between 1 and 10). The idea I had: create a grid vector with all the moments at which something happens in any simulation. Then, looping over the simulations, loop over the timesteps in which something happens and incrementing all the counter indeces that corresponds to this particular slice of time.

However this second part is very heavy (I mean days of processing on a standard quad-core CPU). And it shouldn't. Are there any ideas (maybe about comparing vectors in a more efficient way) to cut the CPU time?

This is a standalone 'second_part'

N=5000;
counter=zeros(11,length(GRID));

for i=1:N
    disp(['Counting sim #' num2str(i)]);
    for j=1:(kmax)
        f=find(GRID==T_(i,j) | GRID==T_(i,j+1),2);
        s=f(1);
        e=f(2)-1;

        counter(X_(i,j), s:e)=counter(X_(i,j), s:e)+1;

    end
end

counter=counter/N;
stop=find(GRID==Tmin);
stop=stop-1;
plot(counter(:,(stop-500):stop)')

with associated dummy data ( filedropper.com/data_38 ). In the real context the matrix has 2x rows and 10x columns.

If this takes *days* I am almost sure that most of the time is from `fffun`. Try to profile your code with a small `N` — Ander Biguri, Feb 23 '17 at 15:54
@Adriaan unfortunately this is not the case. the first part takes less than a minute. in the second part there's no unallocated variable. :( — Surfer on the fall, Feb 23 '17 at 16:55
@AnderBiguri this is not the case. the first part takes less than a minute. :( — Surfer on the fall, Feb 23 '17 at 16:56
I recommend that you check that the preallocation was done correctly and that `counter` is indeed not growing. Then you could look into vectorizing your double `for` loop. Use `profile` to see where the slowest parts are. Hard to say more without a runnable example (even with dummy data since the first part of the code is not the issue). You can also change the `find` to only look for the first two matches by adding a second argument. — horchler, Feb 23 '17 at 17:09
Why do you need both `GRID` and `T_`? It looks like at the end of the first loop `T_==GRID.'` — EBH, Feb 23 '17 at 18:40
@EBH maybe you're right, but the point is: the performance problem occurs in the last two nested loops. — Surfer on the fall, Feb 23 '17 at 18:56
@Surferonthefall I understand that, but I'm also trying to understand your code, so I can try to reproduce this problem. Nevertheless, writing a clearer code is alway better. — EBH, Feb 23 '17 at 19:00
@horchler this is a working 'second_part' ( http://pastebin.com/0MCgkshY ) with associated dummy data ( http://www.filedropper.com/data_38 ). In the real context the matrix has 2x rows and 10x columns. I'm trying to enhance the code profiling but I've never done things like that. Could you please give me some hints? I'd be extremely grateful. — Surfer on the fall, Feb 23 '17 at 19:39

EBH · Accepted Answer · 2017-08-05T21:46:10.377

Here is what I understand:

T_ is a matrix of time steps from N simulations.
X_ is a matrix of simulation state at T_ in those simulations.

so if you do:

[ut,~,ic]= unique(T_(:));

you get ic which is a vector of indices for all unique elements in T_. Then you can write:

counter = accumarray([ic X_(:)],1);

and get counter with no. of rows as your unique timesteps, and no. of columns as the unique states in X_ (which are all, and must be, integers). Now you can say that for each timestep ut(k) the number of time that the simulation was in state m is counter(k,m).

In your data, the only combination of m and k that has a value greater than 1 is (1,1).

Edit:

From the comments below, I understand that you record all state changes, and the time steps when they occur. Then every time a simulation change a state you want to collect all the states from all simulations and count how many states are from each type.

The main problem here is that your time is continuous, so basically each element in T_ is unique, and you have over a million time steps to loop over. Fully vectorizing such a process will need about 80GB of memory which will probably stuck your computer.

So I looked for a combination of vectorizing and looping through the time steps. We start by finding all unique intervals, and preallocating counter:

ut = unique(T_(:));
stt = 11; % no. of states
counter = zeros(stt,numel(ut));r = 1:size(T_,1);
r = 1:size(T_,1); % we will need that also later

Then we loop over all element in ut, and each time look for the relevant timestep in T_ in all simulations in a vectorized way. And finally we use histcounts to count all the states:

for k = 1:numel(ut)
    temp = T_<=ut(k); % mark all time steps before ut(k)
    s = cumsum(temp,2); % count the columns
    col_ind = s(:,end); % fins the column index for each simulation
    % convert the coulmns to linear indices:
    linind = sub2ind(size(T_),r,col_ind.');
    % count the states:
    counter(:,k) = histcounts(X_(linind),1:stt+1);
end

This takes about 4 seconds at my computer for 1000 simulations, so it adds to a little more than one hour for the whole process. Not very quick...

You can try also one or two of the tweaks below to squeeze run time a little bit more:

As you can read here, accumarray seems to work faster in small arrays then histcouns. So may want to switch to it.
Also, computing linear indices directly is a quicker method than sub2ind, so you may want to try that.

implementing these suggestions in the loop above, we get:

R = size(T_,1);
r = (1:R).';
for k = 1:K
    temp = T_<=ut(k); % mark all time steps before ut(k)
    s = cumsum(temp,2); % count the columns
    col_ind = s(:,end); % fins the column index for each simulation
    % convert the coulmns to linear indices:
    linind = R*(col_ind-1)+r;
    % count the states:
    counter(:,k) = accumarray(X_(linind),1,[stt 1]);
end

In my computer switching to accumarray and or removing sub2ind gain a slight improvement but it was not consistent (using timeit for testing on 100 or 1K elements in ut), so you better test it yourself. However, this still remains very long.

One thing that you may want to consider is trying to discretize your timesteps, so you will have much less unique elements to loop over. In your data about 8% of the time intervals a smaller than 1. If you can assume that this is short enough to be treated as one time step, then you could round your T_ and get only ~12.5K unique elements, which take about a minute to loop over. You can do the same for 0.1 intervals (which are less than 1% of the time intervals), and get 122K elements to loop over, what will take about 8 hours...

Of course, all the timing above are rough estimates using the same algorithm. If you do choose to round the times there may be even better ways to solve this.

Thanks a lot for the help. There's a substantial problem: when I perform the i-th simulation, things change only at timesteps.. so the j-th state keeps the same for all the T comprised between T_(i,j) and T_(i,j+1) (with this border excluded)... If you could solve this issue.. — Surfer on the fall, Feb 23 '17 at 21:53
@Surferonthefall let me see if I get you right: in a specific simulation, you record all state changes, and the time steps when they occur. So the way to read `T` is the from time step `T(k)` to `T(k+1)` the simulation was in state `X(k)`. Now what you want is to align all those time vectors and for each unique **interval** count all the different simulation states. Correct me if I wrong, and let me get back to you with this... — EBH, Feb 24 '17 at 07:32
yes!! exactly! just a note: the interval is **[** T(k); T(k+1) **[** (i.e. at T(k+1) the state is already changed; T(k+1) isn't included).... — Surfer on the fall, Feb 24 '17 at 08:08
@Surferonthefall see my edit. For your exact case, I don't think there is a very fast solution. However, see also me ending note there. You might want to consider some approximation. Hope this helps ;) — EBH, Feb 25 '17 at 21:44

Vectorizing code - How to reduce MATLAB computational time

1 Answers1