Repeat elements of vector

Question

I have a value vector A containing elements i, for example:

A = [0.1 0.2 0.3 0.4 0.5]; and say r = [5 2 3 2 1];

Now I want to create a new vector Anew containing r(i) repetitions of the values i in A, such that the first r(1)=5 items in Anew have value A(1) and the length of the new vector is sum(r). Thus:

Anew = [0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.5]

I am sure this can be done with an elaborate for-loop combining e.g. repmat, but any chance someone knows how to do this in a smoother way?

score 4 · Accepted Answer · edited May 23 '17 at 12:20

As far as I'm aware, there is no equivalent function to do that in MATLAB, though R has rep that can do that for you.... so jealous.

In any case, the only way I can suggest is to run a for loop with repmat as you suggested. However, you can perhaps do arrayfun instead if you want to do this as a one-liner... well it's technically two to do the post-processing required to get this into a single vector. As such, you can try this:

Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
Anew = vertcat(Anew{:});

This essentially does the for loop and concatenation of the replicated vectors with less code. We go through each pair of values in A and r and spit out replicated vectors. Each of them will be in a cell array, which is why vertcat is required to put it all into one vector.

We get:

Take note that other people have tried something similar to what you're doing in this post: A similar function to R's rep in Matlab. This is essentially mimicking R's way of doing rep, which is what you want to do!

Alternative - Using `for` loops

Because of @Divakar's benchmarking, I'm curious to see how pre-allocating the array, then using an actual for loop to iterate through A and r and populate it by indexing would benchmark. As such, the equivalent code to the above using for loops and indexing would be:

Anew = zeros(sum(r), 1);
counter = 1;
for idx = 1 : numel(r)
    Anew(counter : counter + r(idx) - 1) = A(idx);
    counter = counter + r(idx);
end

We would need a variable that keeps track of where we need to insert elements in the array, which is stored in counter. We offset this by the total number of elements to replicate per number, which is stored in each value of r.

As such, this method completely avoids using repmat and just uses indexing to generate our replicated vectors instead.

Benchmarking (à la Divakar)

Building on top of Divakar's benchmarking code, I actually tried running all of the tests on my machine, in addition to the for loop approach. I simply used his benchmarking code with the same test cases.

These are the timing results I get per algorithm:

Case #1 - `N = 4000`, `max_repeat = 4000`

-------------------  With arrayfun
Elapsed time is 1.202805 seconds.
-------------------  With cumsum
Elapsed time is 1.691591 seconds.
-------------------  With bsxfun
Elapsed time is 0.835201 seconds.
-------------------  With for loop
Elapsed time is 0.136628 seconds.

Case #2 - `N = 10000`, `max_repeat = 1000`

-------------------  With arrayfun
Elapsed time is 2.117631 seconds.
-------------------  With cumsum
Elapsed time is 1.080247 seconds.
-------------------  With bsxfun
Elapsed time is 0.540892 seconds.
-------------------  With for loop
Elapsed time is 0.127728 seconds.

In these cases, cumsum actually beats out arrayfun... which is what I originally expected. bsxfun beats everyone else out, except for the for loop. My guess is with the differing times in arrayfun between myself and Divakar, we are running our code on different architectures. I'm currently running my tests using MATLAB R2013a on a Mac OS X 10.9.5 MacBook Pro machine.

As we can see, the for loop is much quicker. I know for a fact that when it comes to indexing operations in a for loop, the JIT kicks in and gives you better performance.

You could turn it into a one-liner: `cell2mat(arrayfun(@(n) repmat(A(n), [1 r(n)]), 1:numel(r), 'uniformoutput',0))` — Luis Mendo, Sep 26 '14 at 17:06
Guess I spoke too soon! Yours seem to be the best after few quick tests! +1 already! — Divakar, Sep 26 '14 at 17:27
@Divakar I'm rather surprised at the speed! I figured there's a lot of overhead with `arrayfun` and `repmat` that it would be slow. — rayryeng, Sep 26 '14 at 17:34
@rayryeng Could be that `cumsum` is not suited here or not the best possible solution with it is presented here, not sure! — Divakar, Sep 26 '14 at 18:00
@Divakar - I actually tried doing this with the `for` loop and pure indexing and it's faster than everything we've tried. I'll put some benchmarking results in. — rayryeng, Sep 26 '14 at 18:10
@rayryeng Wow, this is awesome! Well done! And OP started with loops right? So he/she is back to it? :) — Divakar, Sep 26 '14 at 18:15
@Divakar - Yup :)... though the OP suggested using `repmat`. I used `for` loops, but decided to use indexing instead. — rayryeng, Sep 26 '14 at 18:16
Well I went till `4000 and 4000` for the first case, as on my system it was throwing out of memory error beyond that point. For the second case, I had to keep the number of elements the same, so `10000 and 1000` seemed like a good bet. So, I think the runtimes would vary between machines depending mainly upon the RAM available. — Divakar, Sep 26 '14 at 19:44
@Divakar - Ah ok. I have an unfair advantage. My Macbook has 16 GB of RAM lol. — rayryeng, Sep 26 '14 at 19:45
If I have some heavy duty work, I will send it to you then! ;) Ah, did I say duty!!? :D — Divakar, Sep 26 '14 at 19:46
@Divakar - Hahah it's the least I can do :) You've already helped me out tons. The GPU on my MacBook is not as powerful as the machine I have at home.... but that one only has 8 GB of RAM. Probably time for an upgrade. — rayryeng, Sep 26 '14 at 19:58
Thanks for all the really helpful suggestions and benchmarking, I think I'm gonna go for the `for` loop but I do need to see about my RAM, although 16 GB must be enough! :) — Fraukje, Sep 29 '14 at 09:02

score 3 · Answer 2 · answered Sep 26 '14 at 16:44

First think of forming an index vector [1 1 1 1 1 2 2 3 3 3 4 4 5]. Noticing the regular increments here makes me think of cumsum: we can get these steps by putting ones at the correct location in a zeros vector: [1 0 0 0 0 1 0 1 0 0 1 0 1]. And that we can get by running another cumsum on the input list. After adjusting for end conditions and 1-based indexing, we get this:

B(cumsum(r) + 1) = 1;
idx = cumsum(B) + 1;
idx(end) = [];
A(idx)

This has to be pretty fast! +1 – Divakar Sep 26 '14 at 16:47 — Divakar, Sep 26 '14 at 16:47

Divakar · Answer 3 · 2014-09-26T17:53:47.373

bsxfun based approach -

A = [0.1 0.2 0.3 0.4 0.5]
r = [5 2 3 2 1]

repeats = bsxfun(@le,[1:max(r)]',r) %//' logical 2D array with ones in each column 
                                    %// same as the repeats for each entry
A1 = A(ones(1,max(r)),:) %// 2D matrix of all entries repeated maximum r times
                         %// and this resembles your repmat 
out = A1(repeats) %// desired output with repeated entries

It could essentially become a two-liner -

A1 = A(ones(1,max(r)),:);
out = A1(bsxfun(@le,[1:max(r)]',r));

Output -

Benchmarking

Some benchmark results could be produced for the solutions presented here thus far.

Benchmarking Code - Case I

%// Parameters and input data
N = 4000;
max_repeat = 4000;
A = rand(1,N);
r = randi(max_repeat,1,N);
num_runs = 10; %// no. of times each solution is repeated for better benchmarking

disp('-------------------  With arrayfun')
tic
for k1 = 1:num_runs
    Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
    Anew = vertcat(Anew{:});
end
toc, clear Anew

disp('-------------------  With cumsum')
tic
for k1 = 1:num_runs
    B(cumsum(r) + 1) = 1;
    idx = cumsum(B) + 1;
    idx(end) = [];
    out1 = A(idx);
end
toc,clear B idx out1

disp('-------------------  With bsxfun')
tic
for k1 = 1:num_runs
    A1 = A(ones(1,max(r)),:);
    out2 = A1(bsxfun(@le,[1:max(r)]',r));
end
toc

Results

-------------------  With arrayfun
Elapsed time is 2.198521 seconds.
-------------------  With cumsum
Elapsed time is 5.360725 seconds.
-------------------  With bsxfun
Elapsed time is 2.896414 seconds.

Benchmarking Code - Case II [Bigger datasize but lesser max of r]

%// Parameters and input data
N = 10000;
max_repeat = 1000;

Results

-------------------  With arrayfun
Elapsed time is 2.641980 seconds.
-------------------  With cumsum
Elapsed time is 3.426921 seconds.
-------------------  With bsxfun
Elapsed time is 1.858007 seconds.

Conclusions from benchmarks

For case I, arrayfun seems like the way to go, while for Case II, bsxfun might be the weapon of choice. So, it seems that the type of data you are dealing with, would really dictate which approach to go with.

@LuisMendo Thanks! I was not really vouching for `bsxfun`, but wow! Actually the `bsxfun` with binary related function handles are really cheap I think. — Divakar, Sep 26 '14 at 17:58
Forgot to vote for you as well. +1 and awesome work on the benchmarks. It's cool how you have to be cognizant of how `r` is structured before you choose which method you want to work with. — rayryeng, Sep 26 '14 at 18:03

Repeat elements of vector

3 Answers3

Alternative - Using `for` loops

Benchmarking (à la Divakar)

Case #1 - `N = 4000`, `max_repeat = 4000`

Case #2 - `N = 10000`, `max_repeat = 1000`

Benchmarking

Conclusions from benchmarks

Linked

Related

Repeat elements of vector

3 Answers3

Alternative - Using for loops

Benchmarking (à la Divakar)

Case #1 - N = 4000, max_repeat = 4000

Case #2 - N = 10000, max_repeat = 1000

Benchmarking

Conclusions from benchmarks

Linked

Related

Alternative - Using `for` loops

Case #1 - `N = 4000`, `max_repeat = 4000`

Case #2 - `N = 10000`, `max_repeat = 1000`