I'm trying to count the number of users for different cohorts. I found a way to do it using dplyr, but I'd like to implement a solution using data.table, so to improve efficiency and as an exercise.
Libraries that I'm using for this example:
library(dplyr)
library(magrittr)
library(data.table)
Let's say that I have this df:
df <- data.frame(V1 = sample(c("a", "b", "c"), 11, TRUE),
V2 = sample(c("2016", "2017", "2018"), 11, TRUE),
V3 = sample(seq(1:3), 11, TRUE),
V4 = sample(seq(1:3), 11, TRUE),
Id = sample(seq(1:5), 11, TRUE))
The solution using dplyr
would be:
for (grp in c("V1", "V2", "V3", "V4")) {
col <- paste0(grp, "_user_cnt")
df %<>%
group_by_(grp) %>%
mutate(!!col := n_distinct(Id)) %>%
ungroup()
}
And my approach with data.table would be something like this:
DT <- data.table(df)
for (grp in c("V1", "V2", "V3", "V4")) {
col <- paste0(grp, "_user_cnt")
DT[, (deparse(col)) := n_distinct(Id), by = get(grp)]
}
The problem is that I don't find the way to pass the col
and the grp
properly, this way computes everything right, but the colnames are quoted, which is nasty and leads to errors. I've tried the techniques suggested here, and the answer and comments of this SO question. But none of them seems to work either. What am I doing wrong?