0

How can I sum multiple columns by group please?

For the following data I can get the expected result using base r split-apply functions:

library(data.table)
set.seed(1234)
df1 = data.table(grp = sample(0:2, 10, TRUE), 
                 a=sample(0:1, 10, TRUE), b=sample(0:1, 10, TRUE), 
                 c=sample(0:1, 10, TRUE), d=sample(0:1, 10, TRUE))
df1
#     grp a b c d
#  1:   1 1 1 1 0
#  2:   1 1 0 0 0
#  3:   0 0 1 1 0
#  4:   2 1 1 1 0
#  5:   0 1 1 0 1
#  6:   0 1 0 1 1
#  7:   1 1 1 0 1
#  8:   1 1 0 1 1
#  9:   2 1 0 1 1
# 10:   1 1 0 1 0

rbindlist(lapply(split(df1, df1$grp), 
                    function(x) as.data.table(t(colSums(x[,-1])))), idcol="grp")
#    grp a b c d
# 1:   0 2 2 2 2
# 2:   1 5 2 3 2
# 3:   2 2 1 2 1

How to do this with data.table syntax?

These attempts do not give the expected outcome:

df1[, .(s = lapply(.SD, sum)), .SDcols=a:d, by=grp]
df1[, .(s = colSums(.SD)), .SDcols=a:d, by=grp]
user2957945
  • 2,353
  • 2
  • 21
  • 40

1 Answers1

2
df1[, lapply(.SD, sum), by = .(grp), .SDcols = c("a", "b", "c", "d")]
#    grp a b c d
# 1:   1 5 2 3 2
# 2:   0 2 2 2 2
# 3:   2 2 1 2 1

I used the programmatic mode of columns determination (c(...)) with the guess that you might need to produce them dynamically, e.g., setdiff(colnames(df1), "grp"). If that's not needed, you can use RonakShah's recommendation of a:d as well.

r2evans
  • 141,215
  • 6
  • 77
  • 149