2

There is a similar question here, Element-wise array replication in Matlab, but I'd like to generalize it slightly. The simple case wants a function 'replicate' which would take a vector, a, and then replicate each element by a number N. E.g.

>> a = [1, 2, 3];
>> replicate(a, 3);
ans = 
  [1, 1, 1, 2, 2, 2, 3, 3, 3]
>>

There are a number of solutions in the above link which are helpful. However, what happens with N is a vector of multiplicities for each element? E.g., I would like something like:

>> a = [1, 2, 3];
>> N = [3, 1, 5];
>> replicate(a,N)
ans = 
  [1, 1, 1, 2, 3, 3, 3, 3, 3]
>>

Unfortunately, my MATLAB-index-fu isn't quite to this level, and I can't figure out how to do this without looping over, say, N, and then using repmat to tile each element of a into a size [N(i),1] vector. E.g. where I loop over the array data, and then repmat it using the multiplicity value in multcol position. Data is steps in a MCMC, and the multiplicity of each step is in the last column.

data=[-3.997 4.402 0.000 703.050 -219.900 289.600 2.000 5.700 -49.100 11.100 3;...
-2.476 2.685 0.000 667.800 -220.210 290.000 1.955 5.710 -48.828 11.116 3; ...
-4.658 0.286 0.000 626.370 -220.420 290.380 2.019 5.991 -49.015 11.1210 2];

multcol = 11;

%unwrap the data
in=1;
for i=1:size(data,1)
  data_uw(in:in+data(i,multcol)-1,:) = ...
    repmat(data(i,1:multcol-1),[data(i,multcol) 1]);
  in=in+data(i,multcol);
end

This works, but is relatively slow. The end result data_uw is each row of the input matrix, data, being replicated the number of times in the multiplicity column.

>> data_uw

data_uw =

Columns 1 through 7

-3.9970    4.4020         0  703.0500 -219.9000  289.6000    2.0000
-3.9970    4.4020         0  703.0500 -219.9000  289.6000    2.0000
-3.9970    4.4020         0  703.0500 -219.9000  289.6000    2.0000
-2.4760    2.6850         0  667.8000 -220.2100  290.0000    1.9550
-2.4760    2.6850         0  667.8000 -220.2100  290.0000    1.9550
-2.4760    2.6850         0  667.8000 -220.2100  290.0000    1.9550
-4.6580    0.2860         0  626.3700 -220.4200  290.3800    2.0190
-4.6580    0.2860         0  626.3700 -220.4200  290.3800    2.0190

Columns 8 through 10

  5.7000  -49.1000   11.1000
  5.7000  -49.1000   11.1000
  5.7000  -49.1000   11.1000
  5.7100  -48.8280   11.1160
  5.7100  -48.8280   11.1160
  5.7100  -48.8280   11.1160
  5.9910  -49.0150   11.1210
  5.9910  -49.0150   11.1210

Is there a better way to do this? Maybe there's a way to adapt the answer in the link above, but I'm not getting it.

Update with Answer

I've used the utility rude available at http://www.mathworks.co.uk/matlabcentral/fileexchange/6436-rude-a-pedestrian-run-length-decoder-encoder.

mult = data(:,multcol);
data = data(:,1:multcol-1);
iterations = sum(mult);

%preallocate the unwrapped data vector for speed
data_uw = zeros(iterations,multcol-1);
nstep = size(data,1);
ind = 1:nstep;
ind_uw = zeros(iterations,1);
ind_uw = rude(mult,ind);
data_uw = data(ind_uw,:);

This seems much faster. Rude makes use of the cumsum technique mentioned in another answer, so that will also work.

Community
  • 1
  • 1
chgreer
  • 43
  • 8
  • Can you make the last block executable? (Dummy initialization for `data`, `multcol`). It's not completely clear what the desired output is. – Pursuit Apr 11 '13 at 23:16
  • Yeah, it definitely looks like a dupe. Sorry I didn't find it while I was looking. Shoot. And now I see that it's linked in the sidebar of the question I linked to. – chgreer Apr 12 '13 at 19:01

3 Answers3

2

The algorithm is run-length decoding and I suggest to use rude(). It's a milestone and very well written MATLAB code.

>> rude(N,a)
ans =
     1     1     1     2     3     3     3     3     3

In your case however the problem should be pre-allocation (which is missing). Pre-allocating and refactoring your code:

% Pre-allocate
out = zeros(sum(data(:,end)),multcol-1);

for i = 1:size(data,1)
    n = data(i,multcol);
    out(in : in+n-1,:) = repmat(data(i,1:end-1),n,1);
    in = in+n;
end
Oleg
  • 10,406
  • 3
  • 29
  • 57
  • Sorry, I stripped out the preallocation in the example. I was doing that in the actual code. I've switched to rude and the speedup has been great. Just from a quick profiler run, it seems to be by a factor of 10-100. Afraid I can't upvote yet, though. – chgreer Apr 11 '13 at 23:48
  • Ideally, it is possible to vectorize `rude()` to accept: a matrix, a vector with the lengths and a dimension argument in order to replicate row or columns. – Oleg Apr 11 '13 at 23:57
0

Something simple:

a = [1, 2, 3];
N = [3, 1, 5];

result = zeros(1,sum(N)); % mem alloc
k = 1;
for n = 1:numel(N)
        p = k+N(n)-1;
        result(1,k:p) = a(n);
        k = p+1;
end;

disp(result);
0

You can use cumsum-based indexing for this kind of thing:

A = [4 5 6];
N = [3 1 5];

cs=cumsum(N);
idx = zeros(1,cs(end));
idx(1+[0 cs(1:end-1)]) = 1; #%[1 0 0 1 1 0 0 0 0]
idx = cumsum(idx); #%[1 1 1 2 3 3 3 3 3]

B = A(idx); #%[4 4 4 5 6 6 6 6 6]
tmpearce
  • 12,523
  • 4
  • 42
  • 60