I just want to calculate the maximum values for each column separately. Using simple sapply
went into a memory overflow:
# dt is my data.table object
res <- sapply(dt, max, na.rm=T) # fails due to memory problems
It is a sparse table of 1 million rows and 1000 columns, with an overall size of 11 GB.
I am working on the file train_date.csv and use the following lines of code:
require(data.table)
dtDate <- fread(paste0(filePath, "train_date.csv"))
dim(dtDate)
require(pryr)
object_size(dtDate)