Reading a large file in chunks: memory issues

Question

I have a several GB dataset that I want to read into R in chunks, make some transformations and export to Vowpal Wabbit format. To do that I read a DT by fread, call couple functions and set DT <- NULL, call garbage collector gc() and repeat the process. However, this still causes memory to be almost at its maximum level and makes the process slow (couple hours).

Based on Tricks to manage the available memory in an R session I wonder if there is a way to update DT from fread (not DT <- fread(), but via := statement) - this way there would not be a sequence of objects. Or do You have any other suggestion?

I would just restart R after reading-in and manipulating each chunk... Although I have no idea what Vowpal Wabbit format is. Is restarting R rather than calling `gc()` an option? — konvas, Jul 17 '14 at 14:49
Yes, this is an option, although it requires a non-trivial effort to store variables. — Love-R, Jul 17 '14 at 14:55
Maybe : try to import your data into sqlite without R (`sqlite> .import `), then do the maximum of transformation from R using the package `Rsqlite` and `RSQLite.extfuns`. — fxi, Jul 17 '14 at 14:57
There's a good chance [dplyr](http://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html) can help here, esp if you can do a quick transform of your data into SQLite or another database (it has to be tabular since you're using `fread`, so this should not be too difficult). `dplyr` will offload some of the data grouping to the database layer and is designed (like `data.table`) with memory efficiency in mind. — hrbrmstr, Jul 17 '14 at 14:57
Are you sure you're not copying objects and not removing them? Double check that you don't have any objects that keep growing as the problem you're describing should not really happen. — eddi, Jul 17 '14 at 15:51
You might look at the LaF package. I just used it to open a flat file. Apparently there is functionality in the package to read in files by piece for the reasons you are talking about. I have not tried it, so I can't say anything for sure, or provide specific syntax. — Mark Danese, Jul 18 '14 at 06:05

Reading a large file in chunks: memory issues

0 Answers0