This does not use fread
but does read the file into an SQLite database in a single pass and then only sending the desired rows into R and no others.
It will first read the file into an SQLite database that it creates for you on the fly and will do so without going through R so that R limitations do not apply to that step. Then it will only read into R the indicated rows so that as long as the rows selected (as opposed to the entire file) can fit into R you should be OK. At the end the database is automatically destroyed.
Below, you may need other args to read.csv.sql
such as header
and sep
depending on the file. See ?read.csv.sql
and ?sqldf
and note that there are differences on the arguments available relative to read.table
arguments.
library(sqldf)
from <- seq(350, 19350, 1000)
where <- paste( sprintf("(rowid between %d and %d)", from, from+500), collapse = " or ")
fn$read.csv.sql("myfile.csv", sql = "select * from file where $where")
Note 1: You can check the sql statement like this:
sql <- fn$identity("select * from file where $where") # sql holds text of SQL command
read.csv.sql("myfile.csv", sql = sql)
Note 2: Here is a self contained example:
library(sqldf)
# write test file
write.table(data.frame(a = 1:20000), "myfile.csv", row.names = FALSE)
from <- seq(350, 19350, 1000)
where <- paste( sprintf("(rowid between %d and %d)", from, from+500), collapse = " or ")
DF <- fn$read.csv.sql("myfile.csv", sql = "select * from file where $where")