I have a function that 1) loads some large CSV files 2) processes those datasets and 3) puts them into a list and returns the list object. It looks something like this
library(data.table)
load_data <- function(){
# Load the data
foo <- data.table(iris) # really this: foo <- fread("foo.csv")
bar <- data.table(mtcars) # really this: bar <- fread("bar.csv")
# Clean the data
foo[, Foo := mean(Sepal.Length) + median(bar$carb)]
# ... lots of code here
# Put datasets into a list
datasets <- list(foo = foo[], bar = bar[])
# Return result
return(datasets)
}
My concern is that, when I build the list object, I am doubling the required memory because I'm basically creating a duplicate copy of each dataset.
- Is my assumption correct?
- If my assumption is correct, is it possible to assign my objects to a list without duplicating them? One possible solution is to load these objects into a list from the getgo
(e.g. datasets <- list(foo = fread("foo.csv"), bar = fread("bar.csv")))
but this is undesirable because the code becomes lengthy and messy, constantly usingdatasets$foo
anddatasets$bar
.