1

I am trying to implement the rowsums solution proposed here Getting rowSums in a data table in R . Basically I want a variable with the sum of top15, top16 and top17 for each row. This output produces an answer but its clearly not right, I am sure I understand what is happening.

I am looking for a data.table solution - I am running this on millions of cases

library( data.table)
d <- structure(list(top15 = c(1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1), top16 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0), top17 = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)), class = c("data.table", 
"data.frame"), row.names = c(NA, -20L))

d[ , tops:=lapply(.SD,sum), .SDcols=c(paste0("top", 15:17))]
MatthewR
  • 2,660
  • 5
  • 26
  • 37

1 Answers1

2

We can use rowSums on the Subset of data.table (.SD), which can also take care of the NA elements with na.rm

nm1 <- paste0("top", 15:17)
d[, tops := rowSums(.SD, na.rm = TRUE), .SDcols = nm1]

Or if there are no NA elements, then do + with Reduce

d[, tops := Reduce(`+`, .SD), .SDcols = nm1]
akrun
  • 874,273
  • 37
  • 540
  • 662