How to aggregate data in R with mode (most common) value for each row?

Question

I have a data set for example,

Data <- data.frame(
  groupname = as.factor(sample(c("a", "b", "c"), 10, replace = TRUE)),
  someuser = sample(c("x", "y", "z"), 10, replace = TRUE))


   groupname someuser
1          a        x
2          b        y
3          a        x
4          a        y
5          c        z
6          b        x
7          b        x
8          c        x
9          c        y
10         c        x

How do I aggregate the data so that I get:

groupname someuser
a         x
b         x
c         x

that is the most common value for each of the groupname.

PS: Given my setup, I have the limitation of using only 2 pakcages - plyr & lubridate

no, i am automating it through excel and can't program excel to install new packages via R — dsauce, Sep 20 '15 at 23:36

score 7 · Accepted Answer · edited May 23 '17 at 12:08

7

You can combine this function for finding the mode with aggregate.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

aggregate(someuser ~ groupname, Data, Mode)

  groupname someuser
1         a        x
2         b        x
3         c        x

Note that in the event of a tie, it will only return the first value.

edited May 23 '17 at 12:08

Community

1
1

answered Sep 20 '15 at 22:37

Ritchie Sacramento

29,890
4
48
56

Whitebeard · Answer 2 · 2015-09-20T22:44:40.930

This might work for you - using base R

set.seed(1)
Data <- data.frame(
  groupname = as.factor(sample(c("a", "b", "c"), 10, replace = TRUE)),
  someuser = sample(c("x", "y", "z"), 10, replace = TRUE))
Data
   groupname someuser
1          a        x
2          b        x
3          b        z
4          c        y
5          a        z
6          c        y
7          c        z
8          b        z
9          b        y
10         a        z

res <- lapply(split(Data, Data$groupname), function(x) 
  data.frame(groupname=x$groupname[1], someuser=names(sort(table(x$someuser),
             decreasing=TRUE))[1]))

do.call(rbind, res)
  groupname someuser
a         a        z
b         b        z
c         c        y

And using ddply

sort_fn2 <- function(x) {names(sort(table(x$someuser), decreasing=TRUE))[1]}
ddply(Data, .(groupname), .fun=sort_fn2)
  groupname V1
1         a  z
2         b  z
3         c  y

score 1 · Answer 3 · answered Sep 20 '15 at 22:37

Many options. Here one using table to compute frequency and which.max to select max occurred. within data.table framework:

library(data.table)
setDT(Data)[,list(someuser={
  tt <- table(someuser)
  names(tt)[which.max(tt)]
}),groupname]

using plyr( nearly the same) :

library(plyr)
ddply(Data,.(groupname),summarize,someuser={
  tt <- table(someuser)
  names(tt)[which.max(tt)]
})

How to aggregate data in R with mode (most common) value for each row?

3 Answers3

Linked