0

I have a very large data file (my_file.dat) containing 31191984 rows of several variables. I would like to programmatically import this dataset into R in small parts e.g. data frames each of 1 million rows. At this link, it is suggested to use read.table() with nrows option. It works for the first round of 1 million rows using this command:

  my_data <- read.table("path_to_my_file.dat", nrows = 1e+06)

How do I automate this procedure for the next rounds of 1 million rows until all parts are imported as R data frames? I am aware that one of the options could be to storing the data into SQL database and let R talk to SQL. However, I am looking for R specific solution only.

khajlk
  • 791
  • 1
  • 12
  • 32
  • 1
    You could use the `skip` parameter along with `nrows` – SmitM Oct 24 '18 at 18:27
  • 1
    See the answer I just gave [here](https://stackoverflow.com/a/52972430/2789863) for a very similar question. – tblznbits Oct 24 '18 at 18:31
  • What about `fread` from `data.table` also with `nrows`? Shouldn't that be faster than read.table? – SeGa Oct 24 '18 at 19:16
  • Possible duplicate of [Reading a CSV file, looping through the rows, using connections](https://stackoverflow.com/questions/52972229/reading-a-csv-file-looping-through-the-rows-using-connections) – user2554330 Oct 24 '18 at 20:15

1 Answers1

1

You can use skip:

for(i in 1:n){
read.table("file.txt",skip=i*1e+06 ,nrows=1e+06 )
}

As mentioned for example here

gaut
  • 5,771
  • 1
  • 14
  • 45