2

I have 3 columns of data:

time     = [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16];
category = [1;1;1;1;2;2;2;2;3; 3; 3; 3; 4; 4; 4; 4];
data     = [1;1;0;1;2;2;1;2;3; 3; 2; 3; 4; 4; 4; 3];

I am using the following to extract the minimum data values for each category:

groupmin = accumarray(category,data,[],@min)

Which outputs:

groupmin = [0;1;2;3]

However, I would really like to have an output that also tells me which time point the minimums are from, e.g.

timeofgroupmin  = [3;7;11;16]
groupmin        = [0;1; 2; 3]

Alternatively, I would like to have the minimums output in a vector of their own, with NaNs for any row which was not the minimum of its group, e.g.

groupminallrows = [NaN;NaN;0;NaN;NaN;NaN;1;NaN;NaN;NaN;2;NaN;NaN;NaN;NaN;3];

Either approach would solve my problem. As a Matlab novice I'm struggling to know which terms to search for.

ekad
  • 14,436
  • 26
  • 44
  • 46
Jo.
  • 23
  • 2
  • Thanks all - still learning the system. Luis Mendo's suggestion and Amro's both work, but the former seems simpler. – Jo. Jul 08 '14 at 09:28

3 Answers3

3

This works if all data of the same category are in a single run and the categories are sorted, as in your example. Several minimizers are allowed within each category.

r = accumarray(category,data,[],@(v) {(min(v)==v)});
r = vertcat(r{:});
groupminallrows = NaN(size(data));
groupminallrows(r) = data(r);
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Wow - this achieves what I need it to in just 4 lines - now to work out what it's actually doing! Thank you! – Jo. Jul 05 '14 at 17:15
  • @Jo See the intermediate results, and ask me if you need. The key is the anonymous function within `accumarray`, which returns a cell containing a vector – Luis Mendo Jul 05 '14 at 18:05
1

Use accumarray with a custom function:

time     = [1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16];
category = [1;1;1;1;2;2;2;2;3; 3; 3; 3; 4; 4; 4; 4];
data     = [1;1;0;1;2;2;1;2;3; 3; 2; 3; 4; 4; 4; 3];

groupmin = accumarray( A(:,1), A(:,2), [], @min)

Is what you have, but to get the indices of the minima and their time you'd need the second output of the min function, which I don't know if it is possible to get when used with accumarray. But there is the following workaround:

groupidx = accumarray( category, data, [], @(x) find(x == min(x) )).'
occ = cumsum(hist(category,unique(category)))
idx = -occ(1)+occ+groupidx;
timeofgroupmin = time(idx).'
groupmin = data(idx).'

groupmin =

     0     1     2     3

timeofgroupmin =

     3     7    11    16

The desired NaN-vector you could get like:

groupminallrows = NaN(1,numel(data));
groupminallrows(idx) = data(idx)

Regarding your comment:

I assume the reason for that, is that you have multiple minima in each group, then find returns an array. To resolve that you can substitute find(x == min(x)) with find(x == min(x),1). But then you would just get the first occurance of every minimum in each group.

If that is not desired I'd say accumarray is generally the wrong way to go.

Robert Seifert
  • 25,078
  • 11
  • 68
  • 113
  • Thanks! This works really well with the "test" data I provided for simplicity, and small chunks of my actual dataset, but when I try it with all 820,000 rows of my actual data the command groupidx = accumarray( category, data, [], @(x) find(x == min(x) )).' returns the error message: "Error using accumarray The function '@(x)find(x==min(x))' returned a non-scalar value." I think this is because I have some NaNs in my "data". Any ideas? – Jo. Jul 05 '14 at 15:14
  • Thanks thewaywewalk. I am still getting a "returned a non-scalar value" error even with "find(x == min(x),1)" for my actual dataset... So far Amro's code seems to be working for me... Thank you so much for your help! – Jo. Jul 05 '14 at 16:51
1

Try this solution:

% first we group the data into cell according to the group they belong to
grouped = accumarray(category, data, [], @(x){x});

% find the minimum and corresponding index of each group
[mn,idx] = cellfun(@min, grouped);

% fix index by offsetting the position to point the whole data vector
offset = cumsum([0;cellfun(@numel, grouped)]);
idx = idx + offset(1:end-1);

% result
[mn(:) idx(:)]
assert(isequal(mn, data(idx)))

% build the vector with NaNs
mnAll = nan(size(data));
mnAll(idx) = mn;

The resulting vectors:

>> mn'
ans =
     0     1     2     3
>> idx'
ans =
     3     7    11    16
>> mnAll'
ans =
   NaN   NaN     0   NaN   NaN   NaN     1   NaN   NaN   NaN     2   NaN   NaN   NaN   NaN     3

EDIT:

Here is an alternate solution:

% find the position of min value in each category
idx = accumarray(category, data, [], @minarg);

% fix position in terms of the whole vector
offset = cumsum([0;accumarray(category,1)]);
idx = idx + offset(1:end-1);

% corresponding min values
mn = data(idx);

I'm using the following custom function to extract the second output argument from min:

minarg.m

function idx = minarg(X)
    [~,idx] = min(X);
end

The results are the same as above.

Amro
  • 123,847
  • 25
  • 243
  • 454
  • Thanks Amro, this seems to be working very well with my actual dataset (I'm still reading to understand what the commands are actually doing!). However, the "assertion" command reports back "Assertion failed" every time. Should I be concerned? The data look correct when plotted and are plotting in the right place... – Jo. Jul 05 '14 at 16:49
  • as I understood you have lots of NaN values in the data, so try to replace `isequal` above with `isequaln` (because if a category contains only NaN values, then the minimum value is also `NaN`, and as you may know we cant compare NaNs for equality, you need the special `isequaln` or the older `isequalwithequalnans`) – Amro Jul 05 '14 at 16:55
  • Great! Error message gone with isequalwithnans - I seem to have an old version of Matlab. Thank you so much for your help! – Jo. Jul 05 '14 at 17:02
  • Glad I could help. One thing I should have mentioned is that `category` *must* be sorted before calling [`accumarray`](http://www.mathworks.com/help/matlab/ref/accumarray.html#example_7), otherwise you must explicitly do it yourself (dont forget to order the corresponding `data` values to match it). This is because we depend on the order being deterministic. You can read about this detail here: http://stackoverflow.com/questions/17536558/fun-depending-on-order-of-subs-and-values-in-accumarray – Amro Jul 05 '14 at 17:30