Most frequent factor in a group by operation, in R

Question

I have a dataframe like this:

ID_CLI	CHURN
12	0
12	0
25	1
25	1
25	1
27	0

I want to group by on ID_CLI and have an output like this:

ID_CLI	CHURN
12	0
25	1
27	0

Rui Barradas · Answer 1 · 2021-07-06T09:10:22.687

Here is a dplyr way.

library(dplyr)

df1 %>%
  count(ID_CLI, CHURN) %>%
  group_by(ID_CLI) %>%
  slice_max(order_by = n, n = 1) %>%
  select(-n)
## A tibble: 3 x 2
## Groups:   ID_CLI [3]
#  ID_CLI CHURN
#   <int> <int>
#1     12     0
#2     25     1
#3     27     0

And a base R way.

df2 <- local({
  tbl <- table(df1)
  data.frame(
    ID_CLI = unique(df1$ID_CLI),
    CHURN = colnames(tbl)[apply(tbl, 1, which.max)]
  )
})
df2
#  ID_CLI CHURN
#1     12     0
#2     25     1
#3     27     0

Data

df1 <- read.table(text = "
ID_CLI  CHURN
12  0
12  0
25  1
25  1
25  1
27  0
", header = TRUE)

score 1 · Answer 2 · answered Jul 06 '21 at 09:02

1

I think I've found an answer:

    df <- df %>% 
    group_by(ID_CLI) %>% 
    summarize (CHURN =names(which.max(table(CHURN))))

answered Jul 06 '21 at 09:02

lorenzlorg

125
7

score 1 · Answer 3 · answered Jul 06 '21 at 09:12

This is an extremely easy operation. I'd suggest for you to check out a few beginner packages, especially starting with dplyr. Nevertheless, here's a readily usable answer for you.

library(dplyr)

df %>% 
  distinct()

where df is given by:

df <- structure(list(ID_CLI = c(12, 12, 25, 25, 25, 27), 
                     CHURN = c(0, 0, 1, 1, 1, 0)),
                class = "data.frame", row.names = c(NA, -6L))

You can use ?distinct to get an idea on how it operates for future use. A quick cheat sheet to get started:

https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

score 1 · Answer 4 · answered Jul 06 '21 at 12:05

You can use the Mode function from here and apply it to every group.

library(dplyr)

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>% group_by(ID_CLI) %>%  summarize(CHURN = Mode(CHURN))

#  ID_CLI CHURN
#   <int> <int>
#1     12     0
#2     25     1
#3     27     0

Most frequent factor in a group by operation, in R

4 Answers4

Data