I have a data.table like this
DT <- ata.table::data.table(
ref = rep(3L, 4L),
nb = 12:15,
i1 = c(3.1e-05, 0.044495, 0.82244, 0.322291),
i2 = c(0.000183, 0.155732, 0.873416, 0.648545),
i3 = c(0.000824, 0.533939, 0.838542, 0.990648),
i4 = c(0.044495, 0.82244, 0.322291, 0.393595)
)
DT
# ref nb i1 i2 i3 i4
# 1: 3 12 0.000031 0.000183 0.000824 0.044495
# 2: 3 13 0.044495 0.155732 0.533939 0.822440
# 3: 3 14 0.822440 0.873416 0.838542 0.322291
# 4: 3 15 0.322291 0.648545 0.990648 0.393595
Now I want to calculate rows sums, but only including columns which start with an "i" ("i1", "i2", etc)
I have used grep
to create a vector of the column names to be summed:
listCol <- colnames(DT)[grep("i", colnames(DT))]
listCol
# [1] "i1" "i2" "i3" "i4"
Then I have tried to loop over columns:
DT$sum <- rep.int(0, nrow(DT))
for (i in listCol){
DT$sum = DT$sum + DT[ , get(i)]
}
...which gives the desired output:
DT
# ref nb i1 i2 i3 i4 sum
# 1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
# 2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
# 3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
# 4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
How can I improve my code?
Sub question:
This sub-question includes partially the answer to the previous one :
How to avoid this kind of strange notation :
myrowMeans = function (x){
rowMeans(x, na.rm = TRUE)
}
DT[ , var := myrowMeans(.SD-myrowMeans(.SD)^2), .SDcols = grep("i", colnames(DT))]