Summing many columns with data.table in R, remove NA

Question

This is really two questions I guess. I'm trying to use the data.table package to summarize a large dataset. Say my original large dataset is df1 and unfortunately df1 has 50 columns (y0... y49) that I want the sum of by 3 fields (segmentfield1, segmentfield2, segmentfield3). Is there a simpler way to do this than typing every y0...y49 column out? Related to this, is there a generic na.rm=T for the data.table instead of typing that with each sum too?

dt1 <- data.table(df1)
setkey(dt1, segmentfield1, segmentfield2, segmentfield3)
dt2 <- dt1[,list( y0=sum(y0,na.rm=T), y1=sum(y1,na.rm=T), y2=sum(y2,na.rm=T), ... 
            y49=sum(y49,na.rm=T) ),
            by=list(segmentfield1, segmentfield2, segmentfield3)]

@rcs, not quite a duplicate, but similar – Ricardo Saporta Sep 23 '13 at 14:36 — Ricardo Saporta, Sep 23 '13 at 14:36

score 7 · Accepted Answer · answered Sep 23 '13 at 14:38

First, create the object variables for the names in use:

colsToSum <- names(dt1)  # or whatever you need
summedNms <- paste0( "y", seq_along(colsToSum) )

If you'd like to copy it to a new data.table

dt2 <- dt1[, lapply(.SD, sum, na.rm=TRUE), .SDcols=colsToSum]
setnames(dt2, summedNms)

If alternatively, youd like to append the columns to the original

dt1[, c(summedNms) := lapply(.SD, sum, na.rm=TRUE), .SDcols=colsToSum]

As far as a general na.rm process, there is not one specific to data.table, but have a look at ?na.omit and ?na.exclude

you can use `function(x) fun(na.omit(.SD))` for functions that don't have a na.rm option. — Dean MacGregor, Sep 23 '13 at 14:56

Summing many columns with data.table in R, remove NA

1 Answers1