I have a dataframe like this:
ID_CLI | CHURN |
---|---|
12 | 0 |
12 | 0 |
25 | 1 |
25 | 1 |
25 | 1 |
27 | 0 |
I want to group by on ID_CLI and have an output like this:
ID_CLI | CHURN |
---|---|
12 | 0 |
25 | 1 |
27 | 0 |
I have a dataframe like this:
ID_CLI | CHURN |
---|---|
12 | 0 |
12 | 0 |
25 | 1 |
25 | 1 |
25 | 1 |
27 | 0 |
I want to group by on ID_CLI and have an output like this:
ID_CLI | CHURN |
---|---|
12 | 0 |
25 | 1 |
27 | 0 |
Here is a dplyr
way.
library(dplyr)
df1 %>%
count(ID_CLI, CHURN) %>%
group_by(ID_CLI) %>%
slice_max(order_by = n, n = 1) %>%
select(-n)
## A tibble: 3 x 2
## Groups: ID_CLI [3]
# ID_CLI CHURN
# <int> <int>
#1 12 0
#2 25 1
#3 27 0
And a base R way.
df2 <- local({
tbl <- table(df1)
data.frame(
ID_CLI = unique(df1$ID_CLI),
CHURN = colnames(tbl)[apply(tbl, 1, which.max)]
)
})
df2
# ID_CLI CHURN
#1 12 0
#2 25 1
#3 27 0
df1 <- read.table(text = "
ID_CLI CHURN
12 0
12 0
25 1
25 1
25 1
27 0
", header = TRUE)
I think I've found an answer:
df <- df %>%
group_by(ID_CLI) %>%
summarize (CHURN =names(which.max(table(CHURN))))
This is an extremely easy operation. I'd suggest for you to check out a few beginner packages, especially starting with dplyr
. Nevertheless, here's a readily usable answer for you.
library(dplyr)
df %>%
distinct()
where df
is given by:
df <- structure(list(ID_CLI = c(12, 12, 25, 25, 25, 27),
CHURN = c(0, 0, 1, 1, 1, 0)),
class = "data.frame", row.names = c(NA, -6L))
You can use ?distinct
to get an idea on how it operates for future use. A quick cheat sheet to get started:
https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
You can use the Mode function from here and apply it to every group.
library(dplyr)
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
df %>% group_by(ID_CLI) %>% summarize(CHURN = Mode(CHURN))
# ID_CLI CHURN
# <int> <int>
#1 12 0
#2 25 1
#3 27 0