4

Is there a simple (ideally without multiple for loops) way to group a vector of values according to a set of categories in Matlab?

I have data matrix in the form

CATEG_A    CATEG_B   CATEG_C  ...   VALUE

   1          1        1      ...   0.64
   1          2        1      ...   0.86
   1          1        1      ...   0.74
   1          1        2      ...   0.56
  ...

etc.

and what I want is an N-dimensional array

 all_VALUE( CATEG_A, CATEG_B, CATEG_C, ..., index ) = VALUE_i

of course there may be any number of values with the same category combination, so size(end) would be the number of value in the biggest category -- and the remaining items would be padded with nan.

Alternatively I'd be happy with

 all_VALUE { CATEG_A, CATEG_B, CATEG_C, ... } ( index )

i.e. a cell array of vectors. I suppose it's a bit like creating a pivot table, but with n-dimensions, and not computing the mean.

I found this function in the help

A = accumarray(subs,val,[],@(x) {x})

but I couldn't fathom how to make it do what I wanted!

Sanjay Manohar
  • 6,920
  • 3
  • 35
  • 58

2 Answers2

2

This is also a mess, but works. It goes the ND-array way.

X = [1        1        1        0.64
     1        2        1        0.86
     1        1        1        0.74
     1        1        2        0.56]; %// data
N = size(X,1); %// number of values
[~, ~, label] = unique(X(:,1:end-1),'rows'); %// unique labels for indices
cumLabel = cumsum(sparse(1:N, label, 1),1); %// used for generating a cumulative count
    %// for each label. The trick here is to separate each label in a different column
lastInd = full(cumLabel((1:N).'+(label-1)*N)); %'// pick appropriate values from 
    %// cumLabel to generate the cumulative count, which will be used as last index
    %// for the result array
sizeY = [max(X(:,1:end-1),[],1) max(lastInd)]; %// size of result
Y = NaN(sizeY); %// initiallize result with NaNs
ind = mat2cell([X(:,1:end-1) lastInd], ones(1,N)); %// needed for comma-separated list
Y(sub2ind(sizeY, ind{:})) = X(:,end); %// linear indexing of values into Y

The result in your example is the following 4D array:

>> Y
Y(:,:,1,1) =
    0.6400    0.8600
Y(:,:,2,1) =
    0.5600       NaN
Y(:,:,1,2) =
    0.7400       NaN
Y(:,:,2,2) =
   NaN   NaN
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • nice, I tried this at first but got stuck with the cumulative count for each label part. @SanjayManohar this is probably the better solution... – Dan Feb 11 '15 at 15:54
  • @Dan Thanks. Your solution is actually more efficient in terms of memory, as it gives the cell array instead of the N-D array – Luis Mendo Feb 11 '15 at 15:58
  • 1
    Perfect. Thanks also for introducing me to the third output of `unique`. and it's just lovely how `ind` ends up telling each item "where to go". – Sanjay Manohar Feb 11 '15 at 16:01
2

It's a mess but here is one solution

[U,~,subs] = unique(X(:,1:end-1),'rows');

sz = max(U);
Uc = mat2cell(U, size(U,1), ones(1,size(U,2)));
%// Uc is converted to cell matrices so that we can take advantage of the {:} notation which returns a comma-separated-list which allows us to pass a dynamic number of arguments to functions like sub2ind

I = sub2ind(sz, Uc{:});

G = accumarray(subs, X(:,end),[],@(x){x});

A{prod(max(U))} = [];  %// Pre-assign the correct number of cells to A so we can reshape later
A(I) = G;
reshape(A, sz)

On your example data (ignoring the ...s) this returns:

A(:,:,1) = 

    [2x1 double]    [0.8600]


A(:,:,2) = 

    [0.5600]    []

where A(1,1,1) is [0.74; 0.64]

Dan
  • 45,079
  • 17
  • 88
  • 157