Related to How to use data.table within functions and loops?, is there a better way to do the functions shown below, specifically using data.table
?
Note: All codes below are functional, but ... slow.
(I used simple "cleaning" steps just to demonstrate the problem).
The objective is to write a function that 1) efficiently 2) replaces 3) some values in data.table
, so that it can then be used in a loop to clean large quantities of data-sets.
In C++, this would be done using pointers and call by reference as below:
void cleanDT(* dataTable dt); cleanDT(&dt222)
In R however, we are copying entire data-sets (data.tables
) back and forth every time we call a function.
cleanDT <- function (dt) {
strNames <- names(dt); nCols <- 1:length(strNames)
for (i in nCols) {
strCol <- strNames[i]
if ( class(dt[[strCol]]) == "numeric" )
dt[[strCol]] <- floor(dt[[strCol]])
else
dt[[strCol]] <- gsub("I", "i", dt[[strCol]])
}
return(dt)
}
cleanDTByReference <- function (dt) {
dtCleaned <- dt
strNames <- names(dt); nCols <- 1:length(strNames)
for (i in nCols) {
strCol = strNames[i]
if ( class(dt[[strCol]]) == "numeric" )
dtCleaned[[strCol]] <- floor(dt[[strCol]])
else
dtCleaned[[strCol]] <- gsub("I", "i", dt[[strCol]])
}
eval.parent(substitute(dt <- dtCleaned))
}
dt222 <- data.table(ggplot2::diamonds); dt222[1:2]
dt222 <- cleanDT(dt222); dt222[1:2]
dt222 <- data.table(diamonds); dt222[1:2]
# carat cut color clarity depth table price x y z
#1: 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
#2: 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
cleanDTByReference(dt222); dt222[1:2]
# carat cut color clarity depth table price x y z
#1: 0 ideal E Si2 61 55 326 3 3 2
#2: 0 Premium E Si1 59 61 326 3 3 2
Then we would use this function to clean a list of data-tables in a loop like this one:
dt333 <- data.table(datasets::mtcars)
listDt <- list(dt222, dt333)
for(dt in listDt) {
print(dt[1:2])
cleanDTByReference(dt); print(dt[1:2])
}
Ideally, as a result, I would like to have all my data-tables "cleaned" this ways, using a function. But at the moment without use of references, the code above DOES NOT actually change listDt
, nor dt222
, dt333
.
Can you advise how to achieve that?