For this example I would use:
DT[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]
Two alternatives (based on my answer here) with for
:
# alternative 1 with 'set'
for (col in cols) set(DT, j = col, value = as.numeric(DT[[col]]))
# alternative 2 with ':='
for (col in cols) DT[, (col) := as.numeric(DT[[col]])]
Neither of these three approaches is necessarily better. They all have the same advantage: they will update the DT
by reference.
Comparing the different approaches with a benchmark:
microbenchmark(vasily_get = {inpDT <- copy(DT); cast_num_get(inpDT, cols)},
vasily_b = {inpDT <- copy(DT); inpDT <- cast_num_b(inpDT, cols)},
jaap_lapply = {inpDT <- copy(DT); inpDT[, (cols) := lapply(.SD, as.numeric), .SDcols = cols]},
jaap_for_set1 = {inpDT <- copy(DT); for (col in cols) set(inpDT, j = col, value = as.numeric(inpDT[[col]]))},
jaap_for_set2 = {inpDT <- copy(DT); for (col in cols) inpDT[, (col) := as.numeric(inpDT[[col]])]},
times = 100)
gives:
Unit: milliseconds
expr min lq mean median uq max
vasily_get 399.0723 414.2708 530.3024 429.5070 663.3513 1194.827
vasily_b 388.7294 408.0004 528.4039 418.9236 664.5881 1441.941
jaap_lapply 401.8001 424.1902 562.9259 453.5073 668.3900 1376.654
jaap_for_set1 399.2213 433.9918 568.7211 628.4220 668.1248 1198.950
jaap_for_set2 395.1966 405.5584 510.2038 421.3801 652.1263 1097.931
Neither of the approaches stands out with regard to speed. However, the cast_num_b
aproach has one big disadvantage: to make the change permanent, you will have to assign the result of that function back to the input data.table.
When you run the following code:
inpDT <- copy(DT)
address(inpDT)
inpDT <- cast_num_b(inpDT, cols)
address(inpDT)
you get:
> inpDT <- copy(DT)
> address(inpDT)
[1] "0x145eb6a00"
> inpDT <- cast_num_b(inpDT, cols)
> address(inpDT)
[1] "0x12a632ce8"
As you can see, the location in the computer's memory has changed. It can therefore be considered the less efficient approach.
Used data:
DT <- data.table(lets = sample(LETTERS, 1e6, TRUE),
V1 = as.character(rnorm(1e6)),
V2 = as.character(rnorm(1e6)),
V3 = as.character(rnorm(1e6)),
V4 = as.character(rnorm(1e6)))
cols <- names(DT)[2:5]