2

I have a matrix made up of column vectors with values that either take on 0 or 1. What I wish to achieve is to have some form of automated process that creates a parsimonious structure to display the result. That is the process will create the result vectors v1,v2,v3,v4,v5 that correspond to the number of consecutive 1's in each sequence for each column variable.

For instance d=

 0 1 1 1 1 
 1 1 0 0 0
 1 1 1 0 1
 0 0 0 0 0
 1 1 0 1 1

And we get v1=[2,1] v2=[3,1] v3=[1,1] v4=[1,1] v5=[1,1,1]

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
user1922730
  • 303
  • 4
  • 10

2 Answers2

2

This works without loops.

The code should be self-explanatory, otherwise ask me. The result variable is a cell array, because the result has a different size for each column of d.

nrows = size(d,1);
d_neg_cell = num2cell(~d,[nrows 1]);
zeros_d = cellfun(@find, d_neg_cell, 'UniformOutput', 0);
find_runs = @(v) nonzeros( diff([0; v; nrows+1])-1 ).';
sol = cellfun(find_runs, zeros_d, 'UniformOutput', 0);

For your d matrix this gives:

>> sol{:}
ans =
     2     1
ans =
     3     1
ans =
     1     1
ans =
     1     1
ans =
     1     1     1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Suppose some of these columns contain NaN, is there a solution to ignore these NaN,which was keyed in when data is not available. Thanks! – user1922730 Aug 17 '13 at 00:32
  • Just add `d(isnan(d)) = 0;` in the beginning (before the current first line). That will transform the NaN's into zeros. – Luis Mendo Aug 17 '13 at 00:37
  • Sorry, I think I misunderstood. If you have ...1 NaN 1... in a column, do you consider that the NaN breaks the sequence of 1's, or not? – Luis Mendo Aug 17 '13 at 00:40
  • Oh I do not have the problem, the issue is only when the series starts at different times across different columns, so comment 2 was perfect! – user1922730 Aug 17 '13 at 00:44
  • Anyway, if you wanted to remove NaN's (not treat them as zeros), it would suffice to add the following right after the current `zeros_d =`... line: `zeros_d = cellfun(@(x) x(~isnan(x)), d_neg_cell, 'UniformOutput', 0);` – Luis Mendo Aug 17 '13 at 00:49
  • @user1922730 If you are avoiding for loop because of performance, then `cellfun` is not the answer. `cellfun` and `arrayfun` are generally slow functions see [here](http://stackoverflow.com/questions/12522888/arrayfun-can-be-significantly-slower-than-an-explicit-loop-in-matlab-why) for instance. You can benchmark it easily and see the difference. [My benchmark shows](https://gist.github.com/tuix/6257447) that the for loop solution is 7.7 times faster. – Mohsen Nosratinia Aug 17 '13 at 15:39
  • @MohsenNosratinia thank you very much for the information. I'd always been under the impressions that loops would be slower. As a follow up on a loop solution though, suppose some columns in the matrix contain NaNs, Matlab would flag the solution and not execute because of dimensionality incompatibility. What would be a proper solution then?Thanks. – user1922730 Aug 18 '13 at 13:35
1

Iterate over columns, add zeros to beginning and the end for correct edge ditection, take a diff and use positive and negative values to find the location of rising and falling edges. The difference of those positions give you the length of sequences. Here is the code

v = {};
for e = d,
    f = diff([0 e' 0]);
    v{end+1} = find(f<0) - find(f>0);
end

which returns

>> v{:}
ans =
     2     1
ans =
     3     1
ans =
     1     1
ans =
     1     1
ans =
     1     1     1

EDIT in reply to the comment by OP:

In case the columns contain NaN and you want to ignore them, change the line that uses diff and pass the array without NaNvalues:

v = {};
for e = d,
    f = diff([0 e(~isnan(e))' 0]);
    v{end+1} = find(f<0) - find(f>0);
end
Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52