1

I'm trying to read a 7.7GB file using fread, but I get an error that suggests that it stopped reading after only getting partway through the file:

cdr <- fread('/path/to/data.csv')
Read 1687 rows and 610989 (of 610989) columns from 4.000 GB file in 00:02:37
Warning message:
In fread("/path/to/data.csv") :
 Stopped reading at empty line 1688 but text exists afterwards (discarded)

cdr <- fread('/path/to/data.csv', nrows = 2000)
Read 0.0% of 2000 rows
Error in fread("/path/to/data.csv", nrows = 2000) : 
 Expected sep (',') but new line or EOF ends field 500054 on line 1688 when reading data

Note that the error message says the files is 4.000 GB in size, but it's actually 7.7 GB. Similarly, the error message indicates that the file has 1687 rows, but there are actually 3378 rows.

I double-checked, and I can confirm that there is no empty line in this file (thanks @MrFlick for the suggestion).

R is running on a 64-bit Ubuntu instance, and per https://stackoverflow.com/a/18091755/ I checked .Machine$sizeof.pointer and got 8 (I believe that indicates I'm running R in 64 bits).

Community
  • 1
  • 1
todofixthis
  • 1,072
  • 1
  • 12
  • 25
  • Th error message says there is an empty line in your input file that's causing it to stop. Is that the case? Did you expect a blank line in your input? – MrFlick Mar 23 '17 at 15:39
  • Good question. That's a negative; I verified that there is no blank line in the file. – todofixthis Mar 23 '17 at 15:41
  • 1
    you can try `blank.lines.skip=TRUE` to check if it works – s.brunel Mar 23 '17 at 15:41
  • How did you verify there were no blank lines? This is really going to be a lot of guess work without a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The 4.00 number could be just a coincidence. – MrFlick Mar 23 '17 at 15:43
  • I inspected lines 1685-1689 manually in vim. No empty lines. Just to be safe, I also ran an `egrep '^$'`, and it turned up nothing, either. – todofixthis Mar 23 '17 at 15:45

1 Answers1

0

Well, this is embarrassing. It turns out that I was running fread against the wrong file — a file that just happened to be exactly 4 GB because it was truncated during a failed unzip operation.

The real file was actually in a different location, but the two files were named the same and had similar paths, so I got them mixed up.

When I fread'ed the real 7.7GB file, everything worked as expected.

todofixthis
  • 1,072
  • 1
  • 12
  • 25