Create a sequence from vectors of start and end numbers

Question

How is it possible to create a sequence if I have vectors of starting and ending numbers of the subsequences in a vectorized way in Matlab?

Example Input:

A=[12 20 34]  
B=[18 25 37]

I want to get (spacing for clarity):

C=[12 13 14 15 16 17 18    20 21 22 23 24 25    34 35 36 37]

(1) will `C` always be monotonically increasing? (2) What is your motivation for vectorization? (3) what have you tried yourself? — Dan, Jun 04 '15 at 14:10

Dan · Accepted Answer · 2015-06-04T14:47:04.497

7

Assuming that A and B are sorted ascending and that B(i) < A(i+1) holds then:

idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;

C = find(cumsum(idx))

To get around the issue mentioned by Dennis in the comments:

m = min(A)-1;
A = A-m;
B = B-m;

idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;

C = find(cumsum(idx)) + m;

edited Jun 04 '15 at 14:47

answered Jun 04 '15 at 14:02

Dan

45,079
17
88
157

Clever! Nice use of `cumsum`. – Benoit_11 Jun 04 '15 at 14:13
1

Assumes all points are positive integers (and roughly below `1e10`). – Dennis Jaheruddin Jun 04 '15 at 14:20
@DennisJaheruddin Good point. Can get around the positive integer by adding `min(A)+1` at the beginning and then subtracting at the end. If the range of numbers is reasonable, this would actually solve the large integer problem too. – Dan Jun 04 '15 at 14:25
For the modified code, think it might be `m = min(A)-1`? Also, It assumes that `A` and `B` are monotonically increasing, right? – Divakar Jun 04 '15 at 14:43
Yes I said it assumes `A` and `B` are sorted, so it's less general than your or Santhan's answers. Will make the change, thanks. – Dan Jun 04 '15 at 14:46

score 5 · Answer 2 · edited Jun 20 '20 at 09:12

5

cumsum based approach for a generic case (negative numbers or overlaps) -

%// Positions in intended output array at which group shifts
intv = cumsum([1 B-A+1])

%// Values to be put at those places with intention of doing cumsum at the end
put_vals = [A(1) A(2:end) - B(1:end-1)]

%// Get vector of ones and put_vals
id_arr = ones(1,intv(end)-1)
id_arr(intv(1:end-1)) = put_vals

%// Final output with cumsum of id_arr
out = cumsum(id_arr)

Sample run -

>> A,B
A =
    -2    -3     1
B =
     5    -1     3
>> out
out =
    -2    -1     0     1     2     3     4     5    -3    -2    -1     1     2     3

Benchmarking

Here's a runtime test after warming-up tic-toc to compare the various approaches listed to solve the problem for large sized A and B -

%// Create inputs
A = round(linspace(1,400000000,200000));
B = round((A(1:end-1) + A(2:end))/2);
B = [B A(end)+B(1)];

disp('------------------ Divakar Method')
 .... Proposed approach in this solution

disp('------------------ Dan Method')
tic
idx = zeros(1,max(B)+1);
idx(A) = 1;
idx(B+1) = -1;
C = find(cumsum(idx));
toc, clear C idx

disp('------------------ Santhan Method')
tic
In = [A;B];
difIn = diff(In);
out1 = bsxfun(@plus, (0:max(difIn)).',A);            %//'
mask = bsxfun(@le, (1:max(difIn)+1).',difIn+1);     %//'
out1 = out1(mask).';  %//'
toc, clear out1 mask difIn In

disp('------------------ Itamar Method')
tic
C = cell2mat(cellfun(@(a,b){a:b},num2cell(A),num2cell(B)));
toc, clear C

disp('------------------ dlavila Method')
tic
C = cell2mat(arrayfun(@(a,b)a:b, A, B, 'UniformOutput', false));
toc

Runtimes

------------------ Divakar Method
Elapsed time is 0.793758 seconds.
------------------ Dan Method
Elapsed time is 2.640529 seconds.
------------------ Santhan Method
Elapsed time is 1.662889 seconds.
------------------ Itamar Method
Elapsed time is 2.524527 seconds.
------------------ dlavila Method
Elapsed time is 2.096454 seconds.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 04 '15 at 14:20

Divakar

218,885
19
262
358

Thanks for the benchmarking, but you will get vastly different results if you use inputs in orders of magnitude closer to those in the question. As it stands my answer is the slowest, but replace `A` with `round(linspace(1,4000,200));` and it becomes the fastest. – Dan Jun 05 '15 at 06:03
1

@Dan For those small datasizes, since the runtimes would be really small, we need to use a number of trials on top of each approach in that case. Did you do that? I did something like that using your suggested input like [`this`](http://pastebin.com/1K3gUu6p) and got [`these results`](http://pastebin.com/aEqkMm7W). Could you double-check these at your end? In the benchmarking code, you could change `num_iter` depending on your hardware, but it would be best to change it such that the lowest approach takes around 1 sec to overcome the tic-toc overheads. – Divakar Jun 05 '15 at 06:45
1

@Divakar ran your code and got similar results (although your computer is a lot faster than mine!). Still makes a very big difference from those in your solution though and really shows that overhead from `cellfun` and `arrayfun` – Dan Jun 05 '15 at 06:59
@Divakar The latest results looks reasonable as `cumsum` is lightweight compared to `arrayfun` with anonymous functions. May be using `timeit()` for the first datasize might give much accurate results? – Santhan Salai Jun 05 '15 at 06:59
@Dan Yeah! It clearly shows that cellfun's and arrayfun's aren't performing that great. Vectorization FTW! :) – Divakar Jun 05 '15 at 07:02
1

@SanthanSalai Well, tic-toc with warming up and being close to 1-sec could be trusted IMHO. – Divakar Jun 05 '15 at 07:03
@Divakar Point noted! :P – Santhan Salai Jun 05 '15 at 07:07

Santhan Salai · Answer 3 · 2015-06-04T16:29:07.267

Alternative using bsxfun

%// Find the difference between Inputs
difIn = B - A;

%// Do colon on A to A+max(difIn)
out = bsxfun(@plus, (0:max(difIn)).',A);            %//'

%// mask out which values you want
mask = bsxfun(@le, (1:max(difIn)+1).',difIn+1);     %//'

%// getting only the masked values
out = out(mask).'

Results:

>> A,B
A =
-2    -4     1

B =
 5    -3     3

>> out

out =

-2    -1     0     1     2     3     4     5    -4    -3     1     2     3

score 0 · Answer 4 · answered Jun 04 '15 at 13:53

0

This is one option:

C = cell2mat(cellfun(@(a,b){a:b},num2cell(A),num2cell(B)));

answered Jun 04 '15 at 13:53

Itamar Katz

9,544
5
42
74

3

Just noting that `cellfun` is technically shorthand for a loop and not really vectorization – Dan Jun 04 '15 at 13:57

score 0 · Answer 5 · answered Jun 04 '15 at 14:07

0

You can do this:

C = cell2mat(arrayfun(@(a,b)a:b, A, B, "UniformOutput", false))

arrayfun(...) create all the subsequences taking pairs of A and B. cell2mat is used to concatenate each subsequence.

answered Jun 04 '15 at 14:07

dlavila

1,204
11
25

3

But once again, `arrayfun` isn't really considered "vectorization" since it's just a wrapper for a loop. I guess it just depends on what the OP's motivations for "vectorizing" are – Dan Jun 04 '15 at 14:08
2

Also to add, you could use in-built `@colon` instead of your anonymous function for significant speedup – Santhan Salai Jun 04 '15 at 15:15

Create a sequence from vectors of start and end numbers

5 Answers5

Benchmarking