MATLAB : Contingency Table

Question

I have a cell array of the following type :

 datABC =
           [45]  [67]  'A'
           [34]  [44]  'A'
           [11]  [84]  'A'
           [23]  [68]  'A'
           [34]  [44]  'B'
           [30]  [94]  'B'
           [304]  [414]  'C'
           [78]  [110]  'C'
           [34]  [120]  'C'

Now i have to calculate the number of observations and mean of first and second column according to A, B & C.

Thanks in advance.

This will probably help you: http://stackoverflow.com/questions/8061344/how-to-search-for-a-string-in-cell-array-in-matlab — Daniel, Nov 14 '13 at 01:31
@tmpearce I have thought of the command `crosstab` and `hist` . But somehow i couldn't able. — user2983722, Nov 14 '13 at 01:31

chappjc · Answer 1 · 2013-11-14T19:50:49.550

Seeing as the floodgates are opened, I might as well throw in my two cents.

A solution with a loop works well, but you can also eliminate loops, at the expense of readability. First, you can get the unique values in the last column with unique:

stringKeys = unique(datABC(:,3))'

Then you can use an anonymous function and cellfun to count the occurrences of each key:

memberFun = @(x) ismember(datABC(:,3),x);
keyOccurrences = cellfun(@(x) nnz(memberFun(x)),stringKeys)

To compute the mean of the corresponding data for each of the first two columns, you can again use cellfun with non-uniform outputs:

colMeanFun = @(x) mean(reshape([datABC{memberFun(x),1:2}],[],2),1);
colMeans = cellfun(colMeanFun,stringKeys,'UniformOutput',false);
colMeans = vertcat(colMeans{:})

Also have a look ate strcmpmi, which can be used in place of ismember but will ignore case.

Test data:

datABC = {[45]  [67]  'A'; [34]  [44]  'A'; [11]  [84]  'A'; ...
          [23]  [68]  'A'; [34]  [44]  'B'; [30]  [94]  'B'; ...
          [304] [414] 'C'; [78]  [110] 'C'; [34]  [120] 'C'}; % 9-by-3

score 2 · Answer 2 · answered Nov 14 '13 at 08:38

2

Looks like a job for accumarray:

[categories ii jj] = unique(dataABC(:,3));
num = histc(jj,1:max(jj));
mean1 = accumarray(jj, cell2mat(dataABC(:,1)), [], @mean);
mean2 = accumarray(jj, cell2mat(dataABC(:,2)), [], @mean);

Example:

>> dataABC{4,2}

dataABC = 

    [1]    [10]    'A' 
    [2]    [-5]    'B' 
    [3]    [15]    'A' 
    [4]    [40]    'CC'

gives

categories = 

    'A'
    'B'
    'CC'

>> num

num =

     2
     1
     1

>> mean1

mean1 =

     2
     2
     4

>> mean2

mean2 =

   12.5000
   -5.0000
   40.0000

answered Nov 14 '13 at 08:38

Luis Mendo

110,752
13
76
147

@chappjc I couldn't restrain like you did! – Luis Mendo Nov 14 '13 at 19:23
@chappjc Hey, in the end you didn't either! – Luis Mendo Nov 14 '13 at 19:25
I only partially restrained - I typed my answer, deleted it, and waited for others to answer. :D I figure the question is useful to others, so why not answer after a bit of a wait. – chappjc Nov 14 '13 at 19:25
Seriously, `unique` and `accumarray` work very well together. When you realize that the third output of `unique` essentially provides an integer key for each unique value, the use of `accumarray` follows naturally. +1 – chappjc Nov 14 '13 at 19:47
@chappjc `accumarray` addicts!! – Luis Mendo Nov 14 '13 at 20:15

score 1 · Answer 3 · answered Nov 14 '13 at 01:56

1

Get comfortable with logical indexing.

for x = unique(datABC(:,3))'
    idx = strcmp(x, datABC(:, 3));
    disp([x{1} ': ' num2str(sum(idx)) ' observations'])
    disp(mean(cell2mat(datABC(idx, 1:2))))
end

answered Nov 14 '13 at 01:56

Prashant Kumar

20,069
14
47
63

MATLAB : Contingency Table

3 Answers3