frequency count of a column based on two other columns with datatable

Question

I am asking myself the following question.

Is there a smart way to solve the problem using the package data.table instead of using the following code:

install.packages("dplyr")
library(dplyr)
data %>% group_by(Ticker, Year) %>% summarise(count = length(Value[!is.na(Value)]))

Maurits Evers · Accepted Answer · 2018-05-11T10:51:54.323

Do you mean this?

(Note: Sample data is based on data provided in your previous post here).

library(data.table);
setDT(df)[, .(count = sum(!is.na(Value))), by = list(RANDOM, Year)];
#    RANDOM Year count
# 1:      D 2010     2
# 2:      C 2010     2
# 3:      B 2008     5
# 4:      D 2009     4
# 5:      D 2008     4
# 6:      A 2009     3
# 7:      B 2009     5
# 8:      C 2008     4
# 9:      A 2008     8
#10:      A 2010     2
#11:      B 2010     1
#12:      C 2009     8

Sample data

set.seed(2017);
RANDOM <- sample(c("A","B","C","D"), size = 100, replace = TRUE)
Year <- sample(c(2008,2009,2010), 100, TRUE)
Value <- sample(c(0.22, NA), 100, TRUE)
df <- data.frame(RANDOM, Year, Value);

frequency count of a column based on two other columns with datatable

1 Answers1

Sample data