3

I have loaded fread like so:

install.packages('data.table')
require(data.table)

It loads the first half of my 3GB TSV file very quickly with the following code:

> train <- fread("avito_train.tsv")
Read 55.6% of 3995803 rows
Error in fread("avito_train.tsv") :

However, when it get's to 55.6%, it says that there is an error but it doesn't say what. How can I get the error? Is it possible to just skip the rows that it errors on?

Note: The data is available on Kaggle.com if you wish to try it out for yourself http://www.kaggle.com/c/avito-prohibited-content/data

Arun
  • 116,683
  • 26
  • 284
  • 387
user1477388
  • 20,790
  • 32
  • 144
  • 264
  • Obvious question: you have 3GB of RAM available for R to use, right? (particularly, you're in a 64 bit environment running 64 bit R)? – Joe Jun 26 '14 at 15:01
  • Also - useful information includes, in particular, are you familiar with reading in unicode text? There's a not insignificant chance this is a code-page error (ie, it ran into a letter it didn't know how to transcode), given this is Russian text. What's your default codepage - is your OS and R installed in English or some other language? – Joe Jun 26 '14 at 15:04
  • 1
    Same question, same file in fact, here http://stackoverflow.com/questions/24424928/fatal-error-using-fread-in-rstudio. Although I don't see why you aren't seeing the error on your screen. We'll need more info about whether you're using RStudio and whether Windows/Unix/Mac etc. – Matt Dowle Jun 26 '14 at 15:18
  • @Joe I think you are correct; though I am not sure how to handle this... I don't know the encoding of the Russian text document. To your other point, I should have 7GB of Ram on this machine. – user1477388 Jun 26 '14 at 15:36
  • @MattDowle I am using RStudio on Windows 7. – user1477388 Jun 26 '14 at 15:38
  • 2
    Just marked this one as dup because the other one came first, just to keep it in one place. If you want to follow up on why the error message didn't appear, my guess is that it's something to do with RStudio and `\r` on Windows, especially if the pesky character in the file was trying to be printed as part of the error message. Something like that. Try outside RStudio (at the command line) to confirm that. – Matt Dowle Jun 26 '14 at 16:06

0 Answers0