1

I have a dataset as below:

Group   Class
  A       1
  A       2
  A       1
  A       1
  B       2
  B       2
  B       2
  B       1
  B       3
  B       1
  C       1
  C       1
  C       1
  C       2
  C       3

I want to aggregate the table by the ‘Group’ column and the value on the ‘Class’ column would be the Class with maximum count. For instance, for Group A, 1 appears three times, so the value for Class is 1. Similarly, for Group 2, 2 appears three times, so the value for Class is 2. The result table should be the following:

Group   Class
  A       1
  B       2
  C       1

I am new to R programming and would appreciate your help in solving this problem. Thanks!

user2797174
  • 167
  • 2
  • 11
  • 6
    `aggregate(Class~Group, df1, function(x) which.max(table(x)))` – Ronak Shah Mar 29 '17 at 12:17
  • Gah, I got identical answer prepared. Beat me by a minute. :) – Roman Luštrik Mar 29 '17 at 12:20
  • More generally, `aggregate(Class~Group, df, function(x) {temp <- table(x); names(temp)[which.max(temp)]})` to extract more complicated values like class names or in the instance where one class is not observed in a group. – lmo Mar 29 '17 at 12:21
  • 1
    `count(df, Group, Class) %>% slice(which.max(n))` straightforward in dplyr, too – talat Mar 29 '17 at 12:26
  • Wow, working like a charm! thanks so much everyone for your quick reply and special thanks @Ronak Shah – user2797174 Mar 29 '17 at 12:33

1 Answers1

1

You could also do this without using aggregate, so using table and max.col instead:

tb <- table(df$Group, df$Class)
data.frame("Group"=rownames(tb), "CLass"=max.col(tb))

#  Group CLass
#1     A     1
#2     B     2
#3     C     1

Which seems to be faster:

library(microbenchmark)

# by Ronak Shah in comments
f1 <- function(df) aggregate(Class~Group, df, function(x) which.max(table(x))) 
# this answer
f2 <- function(df) {tb <- table(df$Group, df$Class);
 data.frame("Group"=rownames(tb), "CLass"=max.col(tb));}

all(f1(df)==f2(df))
# [1] TRUE

microbenchmark(f1(df), f2(df))

# Unit: microseconds
   # expr     min       lq     mean   median      uq      max neval
 # f1(df) 800.153 838.9130 923.6484 870.0115 918.988 1981.901   100
 # f2(df) 298.367 319.0995 353.4915 338.6305 380.246  599.439   100

df <- structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    Class = c(1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 1L, 
    1L, 2L, 3L)), .Names = c("Group", "Class"), class = "data.frame", row.names = c(NA, 
-15L))
989
  • 12,579
  • 5
  • 31
  • 53