See if this works.
It reads one column at a time using fread
. By default fread
creates a data frame; however, these use external pointers which can be
problem so we use data.table=FALSE
argument. After reading a columni n it immediately writes it back out as an RDS file. After all columns have been written back out as RDS files it reads the RDS files back in and writes the final RDS file out which combines them. We use the 6 row input in the Note at the end as an example.
If fread
with select=
still takes up too much memory use the xsv utility (not an R program) to ensure that only the column of interest is read in. xsv can be downloaded for various platforms here and then use the commented out line instead of the line following it. (Alternately suitably use cut
, sed
or awk
for the same purpose.)
You can also try interspersing the code lines with gc()
to trigger garbage collection.
Also try replacing as.data.frame
in the last line with setDT
.
library(data.table)
File <- "BOD.csv"
freadDF <- function(..., data.table = FALSE) fread(..., data.table = data.table)
L <- as.list(freadDF(File, nrows = 0))
nms <- names(L)
fmt <- "xsv select %s %s"
# for(nm in nms) saveRDS(freadDF(cmd = sprintf(fmt, nm, File))[[1]], paste0(nm, ".rds"))
for(nm in nms) saveRDS(freadDF(File, select = nm)[[1]], paste0(nm, ".rds"))
for(nm in names(L)) L[[nm]] <- readRDS(paste0(nm, ".rds"))
saveRDS(as.data.frame(L), sub(".csv$", ".rds", File))
Note
write.csv(BOD, "BOD.csv", quote = FALSE, row.names = FALSE)