1

I have a table with Ids and Dates. I would like to retrieve the index of the max date for each Id.

My initial approach is so: varfun(@max, table, 'Grouping Variables', 'Id', 'InputVariables','Date');

This obviously gives me the date rather than the index. I noted that the max function will return both the maxvalue and maxindex when specified: [max_val, max_idx] = max(values);

How can I define an anonymous function using max to retrieve max_idx? I would then use it in the var_fun to get my result.

I'd prefer not to declare a cover function (as opposed to a anon func)over max() as: 1. I'm working in a script and would rather not create another function file 2. I'm unwilling to change my current script to a function

Thanks a million guys,

2 Answers2

3

I'm assuming your Ids are positive integers and and your Dates are numbers.

If you wanted the maximum Date for each Id, it would be a perfect case for accumarray with the max function. In the following I'll use f to denote a generic function passed to accumarray.

The fact that you want the index of the maximum makes it a little trickier (and more interesting!). The problem is that the Dates corresponding to a given Id are passed to f without any reference to their original index. Therefore, an f based on max can't help. But you can make the indices "pass through" accumarray as imaginary parts of the Dates.

So: if you want just one maximizing index (even if there are several) for each Id:

result = accumarray(t.Id,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) imag(x(find(real(x)==max(real(x))),1))); %// function f

Note that the function f here maximizes the real part and then extracts the imaginary part, which contains the original index.

Or, if you want all maximizing indices for each Id:

result = accumarray(t.Id,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f

If your Ids are strings: transform them into numeric labels using the third output of unique, and then proceed as above:

[~, ~, NumId] = unique(t.Id);

and then either

result = accumarray(NumId,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) imag(x(find(real(x)==max(real(x))),1))); % function f

or

result = accumarray(NumId,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • 1
    I dropped my initial thought of `accumarray` having no idea how to map the local indices to global ones. Even though it doesn't make for an easy reading, using complex numbers it is a great hack to say least. ;-) [Also thank god our universe's time is real. This makes our watchmakers work a lot easier. :-P] – knedlsepp Feb 09 '15 at 22:15
  • @knedlsepp It always gives me satisfaction to use complex numbers for this kind of tricks :-) Their use here is not really close to the true nature of complex numbers, though. ...Which makes it a little more hackish, hahaha. – Luis Mendo Feb 09 '15 at 22:21
  • @LuisMendo My Ids are actually strings. Perhaps I can map them over to integers. – user2353644 Feb 09 '15 at 23:44
  • @user2353644 That can be done easily with the third output of `unique`. Probably `[~, ~, numId] = unique(t.Id);` would suffice, but I can't test right now. I'll come back later – Luis Mendo Feb 10 '15 at 08:58
  • @LuisMendo you sir have added two more tools to my toolbox (accumarray and img num tricks), thanks – user2353644 Feb 10 '15 at 16:21
  • @user2353644 Glad I could help! I have added the code for numeric Ids. See updated answer – Luis Mendo Feb 10 '15 at 16:40
2

I don't think varfun is the right approach here, as

varfun(func,A) applies the function func separately to each variable of the table A.

This would only make sense if you wanted to apply it to multiple columns.

Simple approach:

Simply go with the loop approach: First find the different IDs using unique, then for each ID find the indices of the maximum dates. (This assumes your dates are in a numerical format which can be compared directly using max.) I did rename your variable table to t, as otherwise we would be overwriting the built-in function table.

uniqueIds = unique(t.Id);
for i = 1:numel(uniqueIds)
    equalsCurrentId = t.Id==uniqueIds(i); 
    globalIdxs = find(equalsCurrentId);
    [~, localIdxsOfMax] = max(t.Date(equalsCurrentId));
    maxIdxs{i} = globalIdxs(localIdxsOfMax);
end

As you mentioned your Ids are actually strings instead of numbers, you will have to change the line: equalsCurrentId = t.Id==uniqueIds(i); to

 equalsCurrentId = strcmp(t.Id, uniqueIds{i});

Approach using accumarray:

If you prefer a more compact style, you could use this solution inspired by Luis Mendo's answer, which should work for both numerical and string Ids:

[uniqueIds, ~, global2Unique] = unique(t.Id);
maxDateIdxsOfIdxSubset = @(I) {I(nth_output(2, @max, t.Date(I)))};
maxIdxs = accumarray(global2Unique, 1:length(t.Id), [], maxDateIdxsOfIdxSubset);

This uses nth_output of gnovice's great answer.

Usage:

Both above solutions will yield: A vector uniqueIds with a corresponding cell-array maxIdxs, in a way that maxIdxs{i} are the indices of the maximum dates of uniqueIds(i). If you only want a single index, even though there are multiple entries where the maximum is attained, use the following to strip away the unwanted data:

maxIdxs = cellfun(@(X) X(1), maxIdxs);
Community
  • 1
  • 1
knedlsepp
  • 6,065
  • 3
  • 20
  • 41