0

I need to read a huge 20gb csv file. I don't need all the rows so I could filter some rows to speed up their loading. For this I read that the read.csv.sql function from the sqldf package is ideal for this task. so use the following code:

library(sqldf)

data <- read.csv.sql("20gbfile.csv", 
                     sql = "select * from file where `var` ='x'")

After a while it throws the following error:

Error in connection_import_file(conn@ptr, name, value, sep, eol, skip) : 
  RS_sqlite_getline could not realloc
In addition: Warning message:
In file.remove(dbname) :
  cannot remove file 'C:\Users\cordo\AppData\Local\Temp\Rtmp4EXpEN\file3ac82415bc0', reason 'Permission denied'

I don't know what could be happening, I would appreciate any help. Needless to say, I am open to other types of solutions.

MatCordTo
  • 223
  • 1
  • 7
  • 1
    read.csv.sql no longer works as RSQLite made non-backward compatible changes which cause errors. The changes are major so it will likely not be feasible to fix that in sqldf. If you use a sufficiently old version RSQLite and possibly DBI it would work. Otherwise use a different approach. Suggest using command line utility such as csvfix, xsv, miller, sed, gawk, mawk or the command line sqlite program that comes with sqlite to preprocess your input. Could also consider the R vroom package. – G. Grothendieck Dec 29 '21 at 14:22
  • Can also use vroom backend in `readr` now with `read_csv('big.csv', lazy = TRUE)` – IceCreamToucan Dec 29 '21 at 14:41
  • Have you thought about cleaning the data outside of R to reduce the file size? Or you could even split it up, then perform the select, filter operations on each split file in a loop. 20GB could give you memory problems all at once. Obviously you'd want to avoid any filters that depend on other field values or calculations. https://stackoverflow.com/questions/31786287/how-to-split-large-text-file-in-windows –  Dec 29 '21 at 15:59
  • Thanks for your suggestions. at the end I divided the data set to pre-process it. – MatCordTo Jan 11 '22 at 11:12

0 Answers0