I have a data.table
like:
library(data.table)
widgets <- data.table(serial_no=1:100,
color=rep_len(c("red","green","blue","black"),length.out=100),
style=rep_len(c("round","pointy","flat"),length.out=100),
weight=rep_len(1:5,length.out=100) )
Although I am not sure this is the most data.table
way, I can calculate subgroup frequency by group using table
and length
in a single step-- for example, to answer the question "What percent of red widgets are round?"
edit: this code does not provide the right answer
# example A
widgets[, list(style = unique(style),
style_pct_of_color_by_count =
as.numeric(table(style)/length(style)) ), by=color]
# color style style_pct_of_color_by_count
# 1: red round 0.32
# 2: red pointy 0.32
# 3: red flat 0.36
# 4: green pointy 0.32
# ...
But I can't use that approach to answer questions like "By weight, what percent of red widgets are round?" I can only come up with a two-step approach:
# example B
widgets[,list(cs_weight=sum(weight)),by=list(color,style)][,list(style, style_pct_of_color_by_weight=cs_weight/sum(cs_weight)),by=color]
# color style style_pct_of_color_by_weight
# 1: red round 0.3466667
# 2: red pointy 0.3466667
# 3: red flat 0.3066667
# 4: green pointy 0.3333333
# ...
I'm looking for a single-step approach to B, and A if improvable, in an explanation that deepens my understanding of data.table
syntax for by-group operations. Please note that this question is different from Weighted sum of variables by groups with data.table because mine involves subgroups and avoiding multiple steps. TYVM.