R: Programmatically read very large data in small parts

Question

I have a very large data file (my_file.dat) containing 31191984 rows of several variables. I would like to programmatically import this dataset into R in small parts e.g. data frames each of 1 million rows. At this link, it is suggested to use read.table() with nrows option. It works for the first round of 1 million rows using this command:

  my_data <- read.table("path_to_my_file.dat", nrows = 1e+06)

How do I automate this procedure for the next rounds of 1 million rows until all parts are imported as R data frames? I am aware that one of the options could be to storing the data into SQL database and let R talk to SQL. However, I am looking for R specific solution only.

See the answer I just gave [here](https://stackoverflow.com/a/52972430/2789863) for a very similar question. — tblznbits, Oct 24 '18 at 18:31
What about `fread` from `data.table` also with `nrows`? Shouldn't that be faster than read.table? — SeGa, Oct 24 '18 at 19:16
Possible duplicate of [Reading a CSV file, looping through the rows, using connections](https://stackoverflow.com/questions/52972229/reading-a-csv-file-looping-through-the-rows-using-connections) — user2554330, Oct 24 '18 at 20:15

score 1 · Accepted Answer · answered Oct 24 '18 at 18:26

1

You can use skip:

for(i in 1:n){
read.table("file.txt",skip=i*1e+06 ,nrows=1e+06 )
}

As mentioned for example here

answered Oct 24 '18 at 18:26

gaut

5,771
1
14
45

R: Programmatically read very large data in small parts

1 Answers1