4

UPDATE: I've done some testing, and the solution of Jonas is the fastest for a range of different size input vectors. In particular, as angainor points out, the solution scales up to large sizes incredibly well - an important test as it is usually the large size problems that prompt us to pose these kind of questions on SO. Thanks to both Jonas and tmpearce for your solutions - based on the efficiency of the solution for large size problems I'm giving the answer tick to Jonas.

My Question: I have this column vector:

Vec = [0; 1; 2; -1; -3; 0; 0; 2; 1; -1];

I would like to convert every element greater than one into a sequence of ones that has length equal to the value of the element. Similarly, I want to convert every element less than minus one into a sequence of minus ones. Thus my output vector should look like this:

VecLong = [0; 1; 1; 1; -1; -1; -1; -1; 0; 0; 1; 1; 1; -1];

Note that each 2 has been changed into two 1's, while the -3 has been changed into three -1's. Currently, I solve the problem like this:

VecTemp = Vec;
VecTemp(VecTemp == 0) = 1;
VecLong = NaN(sum(abs(VecTemp)), 1);
c = 1;
for n = 1:length(Vec)
    if abs(Vec(n)) <= 1
        VecLong(c) = Vec(n);
        c = c + 1;
    else
        VecLong(c:c + abs(Vec(n))) = sign(Vec(n));
        c = c + abs(Vec(n));
    end    
end

This doesn't feel very elegant. Can anyone suggest a better method? Note: You can assume that Vec will contain only integer values. Thanks in advance for all suggestions.

Colin T Bowers
  • 18,106
  • 8
  • 61
  • 89

2 Answers2

3

Edit: I thought of another (slightly obscure) but shorter way to do this, and it is faster than the loop you've got.

for rep=1:100000
    #% original loop-based solution
end
toc
Elapsed time is 2.768822 seconds.

#% bsxfun-based indexing alternative
tic;
for rep=1:100000
TempVec=abs(Vec);TempVec(Vec==0)=1;
LongVec = sign(Vec(sum(bsxfun(@gt,1:sum(TempVec),cumsum(TempVec)))+1))
end
toc
Elapsed time is 1.798339 seconds.

This answer scales pretty well too, compared to the original - at least, to a point. There's a performance sweet spot.

Vec = repmat(OrigVec,10,1);
#% test with 100,000 loops
#% loop-based solution:
Elapsed time is 19.005226 seconds.
#% bsxfun-based solution:
Elapsed time is 4.411316 seconds.

Vec = repmat(OrigVer,1000,1);
#% test with 1,000 loops - 100,000 would be horribly slow
#% loop-based solution:
Elapsed time is 18.105728 seconds.
#% bsxfun-based solution:
Elapsed time is 98.699396 seconds.

bsxfun is expanding the vector into a matrix, then collapsing it with sum. With very large vectors this is needlessly memory heavy compared to the loop, so it ends up losing. Before then though, it does quite well.


Original, slow answer:

Here's a one-liner:

out=cell2mat(arrayfun(@(x) repmat(((x>0)*2)-1+(x==0),max(1,abs(x)),1),Vec,'uni',0));
out' =

     0   1   1   1  -1  -1  -1  -1   0   0   1   1   1  -1

What's going on:

((x>0)*2)-1 + (x==0) #% if an integer is >0, make it a 1, <0 becomes -1, 0 stays 0 

max(1,abs(x)) #% figure out how many times to replicate the value  

arrayfun(@(x) (the above stuff), Vec, 'uni', 0) #% apply the function  
 #% to each element in the array, generating a cell array output

cell2mat( (the above stuff) ) #% convert back to a matrix 
tmpearce
  • 12,523
  • 4
  • 42
  • 60
  • 1
    Note: if you use something like this, write some comments in the code explaining what it does! – tmpearce Oct 27 '12 at 04:19
  • The one-liner is admittedly much cleaner looking - but over an order of magnitude slower on my machine! (no doubt attributable to the extra overhead associated with cell-arrays) – Colin T Bowers Oct 27 '12 at 04:37
  • Yup... `arrayfun` is slow too. Loops are often much faster than `arrayfun` or `cellfun`, and sometimes quite competitive with even better vectorized solutions, especially if preallocation (like you've done) is properly done. They just get a bad rap. One-liners are fun though, especially if the speed isn't crucial :) – tmpearce Oct 27 '12 at 05:00
  • Ha! Funny you should say that - if you want a good read on `arrayfun` versus loops, check out the great answer provided by angainor to one of my other SO questions [here](http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why), ps I've +1'ed your answer, but I'm afraid I can't award the answer tick given the speed issue – Colin T Bowers Oct 27 '12 at 05:04
  • I'm afraid the new solution doesn't work. Check element 11, it is a 0 when it should be a 1? – Colin T Bowers Oct 27 '12 at 06:15
  • Doh, you're right. I just fixed it, for future reference (since it sounds like Jonas's answer works well and is quite fast) – tmpearce Oct 27 '12 at 15:25
  • 1
    No worries, thanks for the fix, and thanks for the solutions. I've given the answer tick to Jonas though, as the solution provided by Jonas scales incredibly well to large input vectors. – Colin T Bowers Oct 28 '12 at 00:12
3

You can use the good old cumsum-approach to repeating the entries properly. Note that I'm assigning a few temporary variables that you can get rid of, if you want to put everything into one line.

%# create a list of values to repeat
signVec = sign(Vec);

%# create a list of corresponding indices that repeat
%# as often as the value in signVec has to be repeated

tmp = max(abs(Vec),1); %# max: zeros have to be repeated once
index = zeros(sum(tmp),1);
index([1;cumsum(tmp(1:end-1))+1])=1; %# assign ones a pivots for cumsum
index = cumsum(index); %# create repeating indices

%# repeat
out = signVec(index);
out'
out =

     0     1     1     1    -1    -1    -1    -1     0     0     1     1     1    -1
Jonas
  • 74,690
  • 10
  • 137
  • 177
  • I'm afraid this isn't working for me on R2012b. The first problem is the second argument to the `sign` and `abs` function. Am I missing something here? My understanding is that both those functions only accept one argument (and certainly R2012b throws an error for me with two input arguments). – Colin T Bowers Oct 27 '12 at 08:01
  • @ColinTBowers: My bad - I've copied the line with the wrong parenthesis. Now everything runs just fine. – Jonas Oct 27 '12 at 10:24
  • Ah yes. Sorry, I probably should have been able to spot that myself. Works nicely and runs faster than the loop (+1) - thanks very much Jonas. I'll wait overnight before choosing an answer in case anyone else wants a crack at it. – Colin T Bowers Oct 27 '12 at 10:29
  • This one looks rather perfect. @Colin, try it for a larger system size. It is almost 10 times faster than your loop. Although you could rework your loop a bit.. – angainor Oct 27 '12 at 12:01
  • @angainor Closer to 20 times faster on my machine :-) Definitely worth the answer tick! Unless the solution is trivial, I just like to wait 24 hours before giving the tick so all the time-zones get a chance to have a crack at a problem. – Colin T Bowers Oct 27 '12 at 23:53