0

I want to import a big csv file in R (approximately 14 million rows and 13 columns). So I tried to use fread with the following code :

my_data <- fread(my_file,
                 sep = ";",
                 header = TRUE,
                 na.strings=c(""," ","NA"),
                 quote = "",
                 fill = TRUE,
                 check.names=FALSE,
                 stringsAsFactors=FALSE))

However, I got the following error :

Error in fread(path_alertes_profil, sep = ";", header = TRUE, na.strings = c("",  : 
  Expecting 13 cols, but line 18533 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=';' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

Therefore I tried to import my file with the function read_delim from the readr package, with the same parameters. It worked since my file appeared in the global environment (I'm working with RStudio). However, it only got 741629 rows instead of the 14+ million rows

How can I solve this problem (I tried to find a solution for the error when using fread() but didn't find any useful resource)

MBB
  • 347
  • 3
  • 18
  • 1
    Did you try without specifying a separator? Regardless, I don't like that `my_data <- test <- as.data.frame(`. Why would you run `as.data.frame` and why would do copy the same data to two data sets? – David Arenburg Jul 31 '17 at 08:32
  • My bad, especially with the `my_data <- test <- as.data.frame(` I modified the question (it isn't like that in my code I made an error while writing the quesiton). I did try without specifying a separator and got the same error message – MBB Jul 31 '17 at 08:37
  • Well, did you check out the row from the error message? – Roland Jul 31 '17 at 09:04
  • I think this could help: https://stackoverflow.com/questions/44714323/r-error-text-after-processing-all-cols-in-fread-data-table/44715042#44715042 – NpT Jul 31 '17 at 09:10
  • Yes ; it is due to some special characters that make the number of column from the corresponding rows goes from 13 to 14 (for instance in line 18533 I think the characters `` are the reasons of why I have the error message. However, it doesn't seem like all errors are due to the same characters, which makes it complicated for me. `read_delim` was pretty useful for me since it said where the parsing failed, so I could delete the annoying rows afterwards because I don't really care of hundreds of errors in my file of 14+ million rows – MBB Jul 31 '17 at 09:20
  • 2
    you could add a header so as to make the number of columns 14 and then you could use `fill=TRUE` within `fread`. Then you just delete the 14th column. – NpT Jul 31 '17 at 09:25
  • Try perhaps printing this line and check it out in console. See [this](https://stackoverflow.com/questions/191364/quick-unix-command-to-display-specific-lines-in-the-middle-of-a-file), for instance. – David Arenburg Jul 31 '17 at 10:33

0 Answers0