0

I am trying to read in certain columns from a 2.5 gb .csv file into Rstudio using the data.table package and fread function. I am selecting the columns I need using the select function because without doing so I get an error message that says Error: cannot allocate vector of size 105.7 Mb. My input commands look like:

library(data.table)

setwd(<set local working directory>)

dat <- fread("File.csv", select = c(1,3,5:6,9:10,13:14,19:20))

When I run this code I get the following warning message telling me that fread stopped early In fread("File.csv", select = c(1, 3, 5:6, 9:10, 13:14, 19:20)) : Stopped early on line 8691.

Is there a way to force fread to not stop early and to read in all rows for the selected columns?

tassones
  • 891
  • 5
  • 18
  • 2
    Probably there is some irregularity on line 8691. Perhaps you can examine that line and figure out what is going on. A command line tool might be the easiest way to see that line without opening the whole file, [e.g., like this](https://stackoverflow.com/q/6022384/903061). I'd suggest looking at the lines above and below to compare and see what's different. – Gregor Thomas May 13 '20 at 16:08
  • 3
    What is on line 8691? One less-expensive way you can look at that row (and the one before/after is: `system("sed -ne '8690,8692p' File.csv")`. That may provide insight as to what on that row is causing problems. (Unless you have many many columns, I would not expect that row to be a demarcation between "enough memory" and "too big".) – r2evans May 13 '20 at 16:11

0 Answers0