-1

I have a dataframe like this:

ID_CLI CHURN
12 0
12 0
25 1
25 1
25 1
27 0

I want to group by on ID_CLI and have an output like this:

ID_CLI CHURN
12 0
25 1
27 0
lorenzlorg
  • 125
  • 7

4 Answers4

2

Here is a dplyr way.

library(dplyr)

df1 %>%
  count(ID_CLI, CHURN) %>%
  group_by(ID_CLI) %>%
  slice_max(order_by = n, n = 1) %>%
  select(-n)
## A tibble: 3 x 2
## Groups:   ID_CLI [3]
#  ID_CLI CHURN
#   <int> <int>
#1     12     0
#2     25     1
#3     27     0

And a base R way.

df2 <- local({
  tbl <- table(df1)
  data.frame(
    ID_CLI = unique(df1$ID_CLI),
    CHURN = colnames(tbl)[apply(tbl, 1, which.max)]
  )
})
df2
#  ID_CLI CHURN
#1     12     0
#2     25     1
#3     27     0

Data

df1 <- read.table(text = "
ID_CLI  CHURN
12  0
12  0
25  1
25  1
25  1
27  0
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

I think I've found an answer:

    df <- df %>% 
    group_by(ID_CLI) %>% 
    summarize (CHURN =names(which.max(table(CHURN))))

lorenzlorg
  • 125
  • 7
1

This is an extremely easy operation. I'd suggest for you to check out a few beginner packages, especially starting with dplyr. Nevertheless, here's a readily usable answer for you.

library(dplyr)

df %>% 
  distinct()

where df is given by:

df <- structure(list(ID_CLI = c(12, 12, 25, 25, 25, 27), 
                     CHURN = c(0, 0, 1, 1, 1, 0)),
                class = "data.frame", row.names = c(NA, -6L))

You can use ?distinct to get an idea on how it operates for future use. A quick cheat sheet to get started:

https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

Emir Dakin
  • 148
  • 5
1

You can use the Mode function from here and apply it to every group.

library(dplyr)

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

df %>% group_by(ID_CLI) %>%  summarize(CHURN = Mode(CHURN))

#  ID_CLI CHURN
#   <int> <int>
#1     12     0
#2     25     1
#3     27     0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213