Most frequent value (mode) by group

Question

I am trying to find the most frequent value by group. In the following example dataframe:

df<-data.frame(a=c(1,1,1,1,2,2,2,3,3),b=c(2,2,1,2,3,3,1,1,2))  
> df  
  a b  
1 1 2  
2 1 2  
3 1 1  
4 1 2  
5 2 3  
6 2 3  
7 2 1  
8 3 1  
9 3 2

I would like to add a column 'c' which has the most occurring value in 'b' when its values are grouped by 'a'. I would like the following output:

I tried using table and tapply but didn't get it right. Is there a fast way to do that?
Thanks!

This is very related to [this](http://stackoverflow.com/questions/2547402/standard-library-function-in-r-for-finding-the-mode) — David Arenburg, Mar 25 '15 at 12:25

score 16 · Accepted Answer · answered Mar 25 '15 at 12:46

Building on Davids comments your solution is the following:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

library(dplyr)
df %>% group_by(a) %>% mutate(c=Mode(b))

Notice though that for the tie when df$a is 3 then the mode for b is 1.

score 8 · Answer 2 · answered Mar 25 '15 at 12:21

8

We could get the 'Mode' of 'b' grouped by 'a' using ave

 Mode <- function(x) {
 ux <- unique(x)
 ux[which.max(tabulate(match(x, ux)))]
}

df$c <-  with(df, ave(b, a, FUN=Mode))
df$c
#[1] 2 2 2 2 3 3 3 1 1

Or using data.table

library(data.table)
setDT(df)[, c:= Mode(b), by=a][]

answered Mar 25 '15 at 12:21

akrun

874,273
37
540
662

1

Thank you @akrun! To exclude NAs from the mode function, change the second line to 'ux <- unique(x[!is.na(x)])'. – WalterB May 05 '20 at 16:57

score 1 · Answer 3 · answered Nov 25 '16 at 18:39

Here is a base R method that uses table to calculate a cross tab, max.col to find the mode per group, and rep together with rle to fill in the mode across groups.

# calculate a cross tab, frequencies by group
myTab <- table(df$a, df$b)
# repeat the mode for each group, as calculated by colnames(myTab)[max.col(myTab)] 
# repeating by the number of times the group ID is observed
df$c <- rep(colnames(myTab)[max.col(myTab)], rle(df$a)$length)

df
  a b c
1 1 2 2
2 1 2 2
3 1 1 2
4 1 2 2
5 2 3 3
6 2 3 3
7 2 1 3
8 3 1 2
9 3 2 2

Note that this assumes the data has been sorted by group. Also, the default of max.col is to break ties (mulitple modes) at random. If you want the first or last value to be the mode, you can set this using the ties.method argument.

Most frequent value (mode) by group

3 Answers3

Linked

Related