sum over multiple columns by group

Question

How can I sum multiple columns by group please?

For the following data I can get the expected result using base r split-apply functions:

library(data.table)
set.seed(1234)
df1 = data.table(grp = sample(0:2, 10, TRUE), 
                 a=sample(0:1, 10, TRUE), b=sample(0:1, 10, TRUE), 
                 c=sample(0:1, 10, TRUE), d=sample(0:1, 10, TRUE))
df1
#     grp a b c d
#  1:   1 1 1 1 0
#  2:   1 1 0 0 0
#  3:   0 0 1 1 0
#  4:   2 1 1 1 0
#  5:   0 1 1 0 1
#  6:   0 1 0 1 1
#  7:   1 1 1 0 1
#  8:   1 1 0 1 1
#  9:   2 1 0 1 1
# 10:   1 1 0 1 0

rbindlist(lapply(split(df1, df1$grp), 
                    function(x) as.data.table(t(colSums(x[,-1])))), idcol="grp")
#    grp a b c d
# 1:   0 2 2 2 2
# 2:   1 5 2 3 2
# 3:   2 2 1 2 1

How to do this with data.table syntax?

These attempts do not give the expected outcome:

df1[, .(s = lapply(.SD, sum)), .SDcols=a:d, by=grp]
df1[, .(s = colSums(.SD)), .SDcols=a:d, by=grp]

sheesh, thank Ronak; don't know what I was doing by assigning a name — user2957945, Nov 17 '20 at 12:53

r2evans · Accepted Answer · 2020-11-17T12:58:09.693

2

df1[, lapply(.SD, sum), by = .(grp), .SDcols = c("a", "b", "c", "d")]
#    grp a b c d
# 1:   1 5 2 3 2
# 2:   0 2 2 2 2
# 3:   2 2 1 2 1

I used the programmatic mode of columns determination (c(...)) with the guess that you might need to produce them dynamically, e.g., setdiff(colnames(df1), "grp"). If that's not needed, you can use RonakShah's recommendation of a:d as well.

edited Nov 17 '20 at 12:58

answered Nov 17 '20 at 12:52

r2evans

141,215
6
77
149

sum over multiple columns by group

1 Answers1