11

Is there a vectorised way to do the following? (shown by an example):

input_lengths = [ 1 1 1 4       3     2   1 ]
result =        [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]

I have spaced out the input_lengths so it is easy to understand how the result is obtained

The resultant vector is of length: sum(lengths). I currently calculate result using the following loop:

result = ones(1, sum(input_lengths ));
counter = 1;
for i = 1:length(input_lengths)
    start_index = counter;
    end_index = counter + input_lengths (i) - 1;

    result(start_index:end_index) = i;
    counter = end_index + 1;
end

EDIT:

I can also do this using arrayfun (although that is not exactly a vectorised function)

cell_result = arrayfun(@(x) repmat(x, 1, input_lengths(x)), 1:length(input_lengths), 'UniformOutput', false);
cell_result : {[1], [2], [3], [4 4 4 4], [5 5 5], [6 6], [7]}

result = [cell_result{:}];
result : [ 1 2 3 4 4 4 4 5 5 5 6 6 7 ]
Shai
  • 111,146
  • 38
  • 238
  • 371
Samuel O'Malley
  • 3,471
  • 1
  • 23
  • 41
  • don't be sure that a "vectorized" function will be faster than a for loop here... you can test with arrayfun and see... also see here http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why – bla May 15 '14 at 06:59
  • Your `for` loop doesn't seem to work. `sum(input_lengths)` is bigger than `length(input_lengths)` so `input_lengths(i)` gives an error for me (Matlab). – David May 15 '14 at 07:02
  • Your `cell_result` code doesn't work either. – Divakar May 15 '14 at 07:02
  • Sorry @David and Divakar I've fixed it. It was supposed to be length not sum. – Samuel O'Malley May 15 '14 at 07:05
  • @SamuelO'Malley Are you finding your loop too slow? It looks pretty good to me. What is your motivation for vectorizing this? – Dan May 15 '14 at 07:12
  • 1
    Hey @Dan, I incorrectly assumed that my for-loop was slowing me down but it was actually elsewhere (due to the massive amount of data being pushed through this function). I also thought that it was an interesting challenge to be able to vectorise, I have certainly seen some elegant solutions to similar problems around the place. – Samuel O'Malley May 15 '14 at 07:33
  • Just a meta-question: why the flag `octave` here? – Bentoy13 May 15 '14 at 08:04

6 Answers6

11

A fully vectorized version:

selector=bsxfun(@le,[1:max(input_lengths)]',input_lengths);
V=repmat([1:size(selector,2)],size(selector,1),1);
result=V(selector);

Downside is, the memory usage is O(numel(input_lengths)*max(input_lengths))

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • 2
    To avoid repmat you can use another `bsxfun` - `V = bsxfun(@times,selector,1:numel(input_lengths)); result = V(V~=0)` – Divakar May 15 '14 at 07:39
  • @Divakar Please post this one as an answer, I think it worths it, and one can make a one-liner from this solution, great! – Bentoy13 May 15 '14 at 07:56
  • 1
    @Bentoy13 And that one-liner would be huge! :) – Divakar May 15 '14 at 08:05
  • @Daniel Made a solution out of the suggestion. Hope that's okay. Let me know otherwise! You deserve all the credit for this elegant solution! – Divakar May 15 '14 at 08:07
  • @Divakar: `repmat` is faster than a second `bsxfun`. – Daniel May 15 '14 at 08:29
  • 2
    Great solution! You could also replace the last two line with `[~, result] = find(selector)`, haven't tested if it gives any performance improvement though – Dan May 15 '14 at 08:45
  • @Daniel First time ever I read that `repmat` is faster than `bsxfun`. I doubt that this fact is true for big vectors ... or is it? – Bentoy13 May 15 '14 at 08:45
  • @Bentoy13: Try it. Unless max(input_lengths) is large, my solution is the fastest. The `find` is close, second bsxfun is slower. For large max(input_lengths) (500+) the "My for loop" solution from David is the fastest. I think bsxfun is slower, because it uses a multiplication instead of a simple logical indexing. – Daniel May 15 '14 at 08:51
  • @Daniel You are right, repmat seems to be better here and might be because `bsxfun` is using `times`. – Divakar May 15 '14 at 10:13
  • 2
    +1 for this approach, and for recognising its downside – Luis Mendo May 15 '14 at 11:01
11

Benchmark of all solutions

Following the previous benchmark, I group all solutions given here in a script and run it a few hours for a benchmark. I've done this because I think it's good to see what is the performance of each proposed solution with the input lenght as parameter - my intention is not here to put down the quality of the previous one, which gives additional information about the effect of JIT. Moreover, and every participant seems to agree with that, quite a good work was done in all answers, so this great post deserves a conclusion post.

I won't post the code of the script here, this is quite long and very uninteresting. The procedure of the benchmark is to run each solution for a set of different lengths of input vectors: 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000. For each input length, I've generated a random input vector based on Poisson law with parameter 0.8 (to avoid big values):

input_lengths = round(-log(1-rand(1,ILen(i)))/poisson_alpha)+1;

Finally, I average the computation times over 100 runs per input length.

I've run the script on my laptop computer (core I7) with Matlab R2013b; JIT is activated.

And here are the plotted results (sorry, color lines), in a log-log scale (x-axis: input length; y-axis: computation time in seconds):

Benchmark 100 trials, all solutions

So Luis Mendo is the clear winner, congrats!

For anyone who wants the numerical results and/or wants to replot them, here they are (cut the table into 2 parts and approximated to 3 digits, for a better display):

N                   10          20          50          100         200         500         1e+03       2e+03
-------------------------------------------------------------------------------------------------------------
OP's for-loop       8.02e-05    0.000133    0.00029     0.00036     0.000581    0.00137     0.00248     0.00542 
OP's arrayfun       0.00072     0.00117     0.00255     0.00326     0.00514     0.0124      0.0222      0.047
Daniel              0.000132    0.000132    0.000148    0.000118    0.000126    0.000325    0.000397    0.000651
Divakar             0.00012     0.000114    0.000132    0.000106    0.000115    0.000292    0.000367    0.000641
David's for-loop    9.15e-05    0.000149    0.000322    0.00041     0.000654    0.00157     0.00275     0.00622
David's arrayfun    0.00052     0.000761    0.00152     0.00188     0.0029      0.00689     0.0122      0.0272
Luis Mendo          4.15e-05    4.37e-05    4.66e-05    3.49e-05    3.36e-05    4.37e-05    5.87e-05    0.000108
Bentoy13's cumsum   0.000104    0.000107    0.000111    7.9e-05     7.19e-05    8.69e-05    0.000102    0.000165
Bentoy13's sparse   8.9e-05     8.82e-05    9.23e-05    6.78e-05    6.44e-05    8.61e-05    0.000114    0.0002
Luis Mendo's optim. 3.99e-05    3.96e-05    4.08e-05    4.3e-05     4.61e-05    5.86e-05    7.66e-05    0.000111

N                   5e+03       1e+04       2e+04       5e+04       1e+05       2e+05       5e+05       1e+06
-------------------------------------------------------------------------------------------------------------
OP's for-loop       0.0138      0.0278      0.0588      0.16        0.264       0.525       1.35        2.73
OP's arrayfun       0.118       0.239       0.533       1.46        2.42        4.83        12.2        24.8
Daniel              0.00105     0.0021      0.00461     0.0138      0.0242      0.0504      0.126       0.264
Divakar             0.00127     0.00284     0.00655     0.0203      0.0335      0.0684      0.185       0.396
David's for-loop    0.015       0.0286      0.065       0.175       0.3         0.605       1.56        3.16
David's arrayfun    0.0668      0.129       0.299       0.803       1.33        2.64        6.76        13.6
Luis Mendo          0.000236    0.000446    0.000863    0.00221     0.0049      0.0118      0.0299      0.0637
Bentoy13's cumsum   0.000318    0.000638    0.00107     0.00261     0.00498     0.0114      0.0283      0.0526
Bentoy13's sparse   0.000414    0.000774    0.00148     0.00451     0.00814     0.0191      0.0441      0.0877
Luis Mendo's optim. 0.000224    0.000413    0.000754    0.00207     0.00353     0.00832     0.0216      0.0441

Ok, I've added another solution to the list ... I could not prevent myself to optimize the best-so-far solution of Luis Mendo. No credit for that, it's just a variant from Luis Mendo's, I'll explain it later.

Clearly, the solutions using arrayfun are very time-consuming. The solutions using an explicit for loop are faster, yet still slow compared with others solutions. So yes, vectorizing is still a major option for optimizing a Matlab script.

Since I've seen a big dispersion on the computing times of the fastest solutions, especially with input lengths between 100 and 10000, I decide to benchmark more precisely. So I've put the slowest apart (sorry), and redo the benchmark over the 6 other solutions which run much faster. The second benchmark over this reduced list of solutions is identical except that I've average over 1000 runs.

Benchmark 1000 trials, best solutions

(No table here, unless you really want to, it's quite the same numbers as before)

As it was remarked, the solution by Daniel is a little faster than the one by Divakar because it seems that the use of bsxfun with @times is slower than using repmat. Still, they are 10 times faster than for-loop solutions: clearly, vectorizing in Matlab is a good thing.

The solutions of Bentoy13 and Luis Mendo are very close; the first one uses more instructions, but the second one uses an extra allocation when concatenating 1 to cumsum(input_lengths(1:end-1)). And that's why we see that Bentoy13's solution tends to be a bit faster with big input lengths (above 5.10^5), because there is no extra allocation. From this consideration, I've made an optimized solution where there is no extra allocation; here is the code (Luis Mendo can put this one in his answer if he wants to :) ):

result = zeros(1,sum(input_lengths));
result(1) = 1;
result(1+cumsum(input_lengths(1:end-1))) = 1;
result = cumsum(result);

Any comment for improvement is welcome.

Community
  • 1
  • 1
Bentoy13
  • 4,886
  • 1
  • 20
  • 33
  • This is amazing, I wish I wasn't colourblind! – Samuel O'Malley May 16 '14 at 00:03
  • Thank you very much, I expand it right now with new results. @SamuelO'Malley I apologize for colorful plotting, I thought it was the best display I can offer here considering that the picture should be a little small. But I put the numbers, so it won't be a problem anymore! – Bentoy13 May 16 '14 at 13:32
  • @Bentoy Thanks. I've added your optimized version in my answer, with due credit :-) – Luis Mendo May 21 '14 at 12:04
10

More of a comment than anything, but I did some tests. I tried a for loop, and an arrayfun, and I tested your for loop and arrayfun version. Your for loop was the fastest. I think this is because it is simple, and allows the JIT compilation to do the most optimisation. I am using Matlab, octave might be different.

And the timing:

Solution:     With JIT   Without JIT  
Sam for       0.74       1.22    
Sam arrayfun  2.85       2.85    
My for        0.62       2.57    
My arrayfun   1.27       3.81    
Divakar       0.26       0.28    
Bentoy        0.07       0.06    
Daniel        0.15       0.16
Luis Mendo    0.07       0.06

So Bentoy's code is really fast, and Luis Mendo's is almost exactly the same speed. And I rely on JIT way too much!


And the code for my attempts

clc,clear
input_lengths = randi(20,[1 10000]);

% My for loop
tic()
C=cumsum(input_lengths);
D=diff(C);
results=zeros(1,C(end));
results(1,1:C(1))=1;
for i=2:length(input_lengths)
    results(1,C(i-1)+1:C(i))=i*ones(1,D(i-1));
end
toc()

tic()
A=arrayfun(@(i) i*ones(1,input_lengths(i)),1:length(input_lengths),'UniformOutput',false);
R=[A{:}];
toc()
David
  • 8,449
  • 1
  • 22
  • 32
  • Just a detail: `D=diff(C);` is useless, it's equal to `input_lengths(2:end)`. – Bentoy13 May 15 '14 at 07:42
  • @David would be interesting to add the `bsxfun`/`repmat` by Daniel/Divakar and the flippy-`cumsum` solution of Bentoy to your benchmark tests – Dan May 15 '14 at 08:37
  • @David I just posted another answer. Would you mind including it in your tests? Thanks in advance! – Luis Mendo May 15 '14 at 10:45
  • @LuisMendo Done :) Your code is pretty quick! I feel dumb now! `:D` – David May 15 '14 at 12:05
  • @David Thanks for the effort of benchmarking! You already had my vote for that – Luis Mendo May 15 '14 at 12:09
  • @all I post a benchmark I've made from all solutions, if anyone interested, under community profile. – Bentoy13 May 15 '14 at 15:31
  • 1
    Thank you so much everyone who contributed. I am blown away by the quality of the answers here and I loved how they all built on each other to find the simplest and best solution. – Samuel O'Malley May 15 '14 at 23:51
9
result = zeros(1,sum(input_lengths));
result(cumsum([1 input_lengths(1:end-1)])) = 1;
result = cumsum(result);

This should be pretty fast. And memory usage is the minimum possible.

An optimized version of the above code, due to Bentoy13 (see his very detailed benchmarking):

result = zeros(1,sum(input_lengths));
result(1) = 1;
result(1+cumsum(input_lengths(1:end-1))) = 1;
result = cumsum(result);
Community
  • 1
  • 1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Simpler than mine, I've looked for this more concise way to compute indexes. Following the current benchmark, I think you should be the fastest. – Bentoy13 May 15 '14 at 11:04
  • @Bentoy13 Thanks! I've asked David to include my solution in his tests, so we'll see – Luis Mendo May 15 '14 at 11:12
  • Thanks @LuisMendo, I am accepting your answer because it is the simplest and fastest, I wish I could accept more than one answer because everyone contributed a lot – Samuel O'Malley May 15 '14 at 23:57
  • 1
    @SamuelO'Malley Yes, it was an interesting question, and fun for all of us! – Luis Mendo May 15 '14 at 23:59
  • @LuisMendo - If I wanted to apply a filter (like: `result = result(filter)`) right after this, would I be able to apply it at an earlier stage to save some time? – Samuel O'Malley May 16 '14 at 00:15
  • @SamuelO'Malley If by filter you mean a logical index: I don't think so (with my approach). At least not easily – Luis Mendo May 16 '14 at 00:31
  • 1
    Yes, sorry I meant logical index. Okay thank you for that. I've managed to bring our code down from a three level nested for-loop taking up to 10 minutes, down to less than a 30 seconds by using this method and eliminating the for-loops – Samuel O'Malley May 16 '14 at 00:55
7

This is a slight variant of @Daniel's answer. The crux of this solution is based on that solution. Now this one avoids repmat, so in that way it's little-more "vectorized" maybe. Here's the code -

selector=bsxfun(@le,[1:max(input_lengths)]',input_lengths); %//'
V = bsxfun(@times,selector,1:numel(input_lengths)); 
result = V(V~=0)

For all the desperate one-liner searching people -

result = nonzeros(bsxfun(@times,bsxfun(@le,[1:max(input_lengths)]',input_lengths),1:numel(input_lengths)))
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
6

I search an elegant solution, and I think David's solution is a good start. What I have in mind is that one can generate the indexes where to add one from previous element.

For that, if we compute the cumsum of the input vector, we get:

cumsum(input_lengths)
ans = 1     2     3     7    10    12    13

This is the indexes of the ends of sequences of identical numbers. That is not what we want, so we flip the vector twice to get the beginnings:

fliplr(sum(input_lengths)+1-cumsum(fliplr(input_lengths)))
ans = 1     2     3     4     8    11    13

Here is the trick. You flip the vector, cumsum it to get the ends of the flipped vector, and then flip back; but you must substract the vector from the total length of the output vector (+1 because index starts at 1) because cumsum applies on the flipped vector.

Once you have done this, it's very straightforward, you just have to put 1 at computed indexes and 0 elsewhere, and cumsum it:

idx_begs = fliplr(sum(input_lengths)+1-cumsum(fliplr(input_lengths)));
result = zeros(1,sum(input_lengths));
result(idx_begs) = 1;
result = cumsum(result);

EDIT

First, please have a look at Luis Mendo's solution, it is very close to mine but is more simpler and a bit faster (I won't edit mine even it is very close). I think at this date this is the fastest solution from all.

Second, while looking at others solutions, I've made up another one-liner, a little different from my initial solution and from the other one-liner. Ok, this won't be very readable, so take a breath:

result = cumsum( full(sparse(cumsum([1,input_lengths(1:end-1)]), ...
ones(1,length(input_lengths)), 1, sum(input_lengths),1)) );

I cut it on two lines. Ok now let's explain it.

The similar part is to build the array of the indexes where to increment the value of the current element. I use the solution of Luis Mendo's for that. To build in one line the solution vector, I use here the fact that it is in fact a sparse representation of the binary vector, the one we will cumsum at the very end. This sparse vector is build using our computed index vector as x positions, a vector of 1 as y positions, and 1 as the value to put at these locations. A fourth argument is given to precise the total size of the vector (important if the last element of input_lengths is not 1). Then we get the full representation of this sparse vector (else the result is a sparse vector with no empty element) and we can cumsum.

There is no use of this solution other than to give another solution to this problem. A benchmark can show that it is slower than my original solution, because of a heavier memory load.

Community
  • 1
  • 1
Bentoy13
  • 4,886
  • 1
  • 20
  • 33
  • Great solution and explanation! – Dan May 15 '14 at 08:40
  • 1
    I think this puts 1 too many of the final digit in `result`. It's very fast though! – David May 15 '14 at 09:23
  • @Bentoy13 Looks like David is right about the extra digit, why the `+1` in `result = zeros(1,sum(input_lengths)+1);`? Looks like you get the correct answer without it – Dan May 15 '14 at 09:30
  • @ Dan & David: I correct it; I really don't know why I put this +1 in zeros. Thank you very much! – Bentoy13 May 15 '14 at 09:48
  • isn't idx_begs = [1 cumsum(input_lengths(1:end-1))+1] an equivalent and simpler solution? It is based on the fact that the indexes of beginning of sequences are the indexes of the ending of sequences +1 (except the first and last elements, that are easy to compute). – Ofri Raviv May 15 '14 at 11:33
  • @OfriRaviv Indeed, this is the solution of [Luis Mendo](http://stackoverflow.com/a/23675831/1507014). It's the simplest solution; I passed through this without seeing that point, but Luis (and you) got that. – Bentoy13 May 15 '14 at 11:41