Consolidate Number of Groups Separated by Zeros

Question

I am recording from an analog device, and assuming the data resembles an example vector like:

A = [1 4 2 0 4 5 8 8 1 0 0 0 4 7 1 9 0 0 0 8 1 2]

I would like to:

1) Count the number of groups of non-zero elements
2) Consolidate the groups that might belong to the same condition

For 1), we might split it into

1 4 2
4 5 8 8 1
4 7 1 9
8 1 2

However, for 2), there is a chance that the values separated by the first, single 0 is actually from the same condition compared to the values separated by more zeros, which means that the vector might be actually split into

1 4 2 4 5 8 8 1
4 7 1 9
8 1 2

A past solution for (1) can be found here, which is:

count = sum(diff([A 0]==0)==1)
a0 = (A~=0);
d = diff(a0);
start = find([a0(1) d]==1)           % Start index of each group
len = find([d -a0(end)]==-1)-start+1 % Length, number of indexes in each group
finish = find([d -a0(end)]==-1)      % Last index of each group
count = length(start);
B = cell(count,1);
for i = 1:count
B{i} = A(start(i):finish(i));

Since I didn't want to necro an old thread, I was wondering if there is a way to make the grouping more robust so that values that are separated by single or double zeros are not split off into an entirely new group.

Ben, can you give us some more info regarding `A`? Which range of values will it contain? Just integers from 0 to 9? — AlessioX, Apr 04 '16 at 17:11
@Alessiox Sorry, I should've given a more realistic example. They are actually non-integers, like `A = [0.01 0.04 0.02 0.00 0.04 0.05 0.08 0.08 0.01 0.00 0.00 0.00 0.04 0.07 0.01 0.09 0.00 0.00 0.00 0.08 0.01 0.02]` Each actual row contains between 5k-10k values. — BenJHC, Apr 04 '16 at 17:31

AlessioX · Accepted Answer · 2016-04-04T21:22:14.747

The case of integer values in range [0:9]

There is a very elegant one-line solution using regular expressions.
First of all convert the vector as a string:

A = [1 4 2 0 4 5 8 8 1 0 0 0 4 7 1 9 0 0 0 8 1 2];
As=num2str(A);
As(As==' ')=[];

the last line is mandatory due to the fact that num2str() also converts blank spaces between numbers. Therefore As will have the form:

As =

1420458810004719000812

and will be a string.

Now the regexp():

out = regexp(As,'0{2,}','split');

Such expression basically says: from As grab the indices in which there are two or more consecutive zeros and return (thanks to split) the non-matching sequences (i.e. we do not what the zeros, we want the non-zeros part of the sequence).

However, out will be a cell array due to the fact that As is a string. If you want it back to numeric, just add:

out=cellfun(@str2num,out);

in order to convert the cell array with strings into a matrix (with numbers of course). Indeed now out has the form:

out =

   142045881        4719         812

The floating point case

A = [0.01 0.04 0.02 0.00 0.04 0.05 0.08 0.08 0.01 0.00 0.00 0.00 0.04 0.07 0.01 0.09 0.00 0.00 0.00 0.08 0.01 0.02];
As=num2str(A);
As(As==' ')=[];

Now As has the form:

As =

0.010.040.0200.040.050.080.080.010000.040.070.010.090000.080.010.02

The two or more zeros are now hard to find. However, patterns emerge: such set of zeros have a non-zero number before (the last decimal from previous number) and have another zero after (if it was a "normal" zero it'd have a decimal point)

[sID,eID]=regexp(As,'[1-9]00{2,}');

where sID and eID are the start and end indices of our substring target(s), respectively [1]. Now let's split As thanks to the above indices [2]:

C{1}=As(1:sID(1));
for ii=2:length(sID)
    C{end+1}=As(eID(ii-1):sID(ii));
end
C{end+1}=As(eID(end):end);

The cell array C now is rather messy due to the fact that there's no such thing as 0.00 or even 000 because Matlab treat 0.00 as simply 0 but we must append a .00 in order to rebuild the original sequence:

for i=1:length(C)
    idx=strfind(C{i},'00');
    if isempty(idx)==false
        C{i}=[C{i}(1:idx) '.00' C{i}(idx+1:end)];
    end
    C{i}=reshape(C{i},4,[])';
end

In the above code, we also reshaped the long strings into matrices, so now we can easily convert them into numeric

C1=cellfun(@str2num,C,'UniformOutput',0);

Now C1 is still a cell array where every cell is a chunk of sequence (in numeric array form). Obviously now we cannot rely on matrices and we are forced to use cell arrays due to the fact that chunks might have different lengths.

Final note

If A has numbers in range {0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08 0.09} you can as well multiply A by 100: in that case A will be a vector of integers and you can easily use the more elegant Integer approach. Then you can convert them back to floating point using this little snippet:

for ii=1:length(out)
    out{ii}=num2str(out{ii});                             %convert to string, so we can enumerate and treat digits separately
    out{ii}=[repmat('0.0',length(out{ii}),1) out{ii}(:)]; %put '0.0' in front of every number
    out{ii}=str2num(out{ii});                             %roll-back to numeric
end

It is finally worth noticing that given the A definition as in the beginning of The floating point case both the floating point case itself and the latter lead to the same results.

[1] suggested by @LuisMendo
[2] improved thanks to @Adiel

Nice approach! But there's a problem if entries of `A` exceed 9, or are negative, or non-integer — Luis Mendo, Apr 04 '16 at 17:09
@LuisMendo, well I've just worked on the provided example. Let's hear some more from the OP — AlessioX, Apr 04 '16 at 17:10
@LuisMendo you were right then, there are floating point numbers. I reckon I'll update my answer asap or fell free to propose your own, I'll remove mine — AlessioX, Apr 04 '16 at 17:36
Maybe you can maintain the regexp approach to get the _indices_ into the original vector — Luis Mendo, Apr 04 '16 at 17:57
@LuisMendo, that's tricky. the `0.00` in `A` (see OP's comment) will be converted to `0` and the `regexp()` will fail. — AlessioX, Apr 04 '16 at 18:30
Very nice, I liked the approach, the implement, and the later fix! BTW, the loop for constructing `C` would be more clean/readable if c{1} would be defined before the loop, and then `for i=2:length(sID)` (and don't use `i` as a variable... :) ) — Adiel, Apr 04 '16 at 20:38
@Adiel, thanks for the feedback. Code improved. I'll never stop using `i` as a variable for the for-loop I'm afraid, it's in my blood now. I reckon that's why Mathworks now has `1!` for the imaginary unit: because people like me can't be cured. — AlessioX, Apr 04 '16 at 20:49
Thanks for the credit about so unnecessary part of that beautiful answer :) After I read the discussion here- http://stackoverflow.com/questions/14790740/using-i-and-j-as-variables-in-matlab about using `i,j`, I tried to stop this practice, and get used very fast to `k` ... — Adiel, Apr 04 '16 at 21:13
@Alessiox Thanks for the help, I really like the approach! I'll also see if I can round the numbers to fewer significant figures and try out that integer method too. — BenJHC, Apr 05 '16 at 08:52

Consolidate Number of Groups Separated by Zeros

1 Answers1