finding indeces of similar group elements

Question

I have a vector test2 that includes NaN 0 and 1 in random order (we cannot make any assumption).

test2 = [NaN 1 1 1 0 0 0 NaN NaN NaN 0 0 0 1 1 1 0 1 1 1 ];

I would like to group the elements containing consecutive 1 and to have in the separte vectors start and finish the first and last index of the groups.

In this case start and finish should be:

start = [2 14 18];
finish = [4 16 20];

I tried to adapt the code provided here coming up with this solution that is not working...could you help me with the right solution and tell me why the one I tried doesn't work?

a = (test2 ==1);
d = diff(a);
start = find([a(1) d]==1);                        % Start index of each group
finish = find([d - a(end)]==-1);                  % Last index of each group


start =

     2    14    18


finish =

     2     3     5     6     7     8     9    10    11    12    14    15    18    19

I am using MATLAB R2013b running on Windows. I tried also using MATLAB R2013a running on ubuntu.

Yep that's what I get for `finish` (and `start` is `[2 14 18]`) — Benoit_11, Jun 29 '15 at 17:32
I tried again....I get finish = 2 3 5 6 7 8 9 10 11 12 14 15 18 19 — gabboshow, Jun 29 '15 at 17:33
that's weird... I tried also with Ubuntu...same wrong result.. — gabboshow, Jun 29 '15 at 17:36
Mhh weird indeed I don't understand. I'm using R2013a on Mac OSX...but that should not make a difference...hopefully someone will see what's going on — Benoit_11, Jun 29 '15 at 17:37
I get the same output as the OP - MATLAB R2013a - Mac OS X Yosemite 10.10.3. My guess is that it may be the `NaN` that is missing things up. Try replacing `NaN` with something else. — rayryeng, Jun 29 '15 at 17:37
I think you may have a typo with this `d-a(end)` since a(end) is 1 it makes `[d - a(end)]` contain a bunch of -1...which is what you are searching for. — Matt, Jun 29 '15 at 17:53

Daniel · Accepted Answer · 2015-06-29T17:53:41.103

3

a = (test2 ==1)
d=diff([0 a 0])
start=find(d==1)
finish=find(d==-1)-1

Padding a zero at the beginning and end is the easiest possibility. Then the special cases where a group starts at index 1 or ends at last index don't cause problems.

Full output:

>> test2 = [NaN 1 1 1 0 0 0 NaN NaN NaN 0 0 0 1 1 1 0 1 1 1 ]

test2 =

  Columns 1 through 16

   NaN     1     1     1     0     0     0   NaN   NaN   NaN     0     0     0     1     1     1

  Columns 17 through 20

     0     1     1     1

>> a = (test2 ==1)

a =

  Columns 1 through 16

     0     1     1     1     0     0     0     0     0     0     0     0     0     1     1     1

  Columns 17 through 20

     0     1     1     1

>> d=diff([0 a 0])

d =

  Columns 1 through 16

     0     1     0     0    -1     0     0     0     0     0     0     0     0     1     0     0

  Columns 17 through 21

    -1     1     0     0    -1

>> start=find(d==1)

start =

     2    14    18

>> finish=find(d==-1)-1

finish =

     4    16    20

>>

edited Jun 29 '15 at 17:53

answered Jun 29 '15 at 17:46

Daniel

36,610
3
36
69

using the solution that you provided I get start = 3 15 19 and finish = 0 2 3 5 6 7 8 9 10 11 12 14 15 18 19 – gabboshow Jun 29 '15 at 17:50
@gabboshow: I added the full output of my command prompt, could you compare it with your output? No idea where the difference occurs. – Daniel Jun 29 '15 at 17:54
Thanks! it works! I think before I modified my solution...using your code (different from mine) I get the desired result... – gabboshow Jun 29 '15 at 17:59

Delyle · Answer 2 · 2015-06-29T18:11:16.093

The problem is the line finish = find([d - a(end)]==-1);, in particular that a(end) == 1. There are two steps to correcting this. First, change the problem line to finish = find(d==-1); This tells MATLAB, "Look for the elements where the difference between adjacent elements is -1". In other words, the vector shifts from 1 to 0 or NaN. If you run the code, you'll get

start =  2    14    18
finish = 4    16

Now, you'll notice the last element isn't detected (i.e. we should get finish(3) == 20. This is because the length of d is one less than the length of test2; the function diff cannot calculate the difference between the last element and the non-existant last+1 element!

To remedy this, we should modify a:

a = [(test2 == 1) 0];

And you will get the right output for start and finish.

finding indeces of similar group elements

2 Answers2