4

I dont manage to read this csv with fread:

➜  Downloads cat t2.csv 
47616412|76-398-12||||7639812
47616413|53-1696-18||||53169618

i think it's because in vim I see those characters:

47616412|76-398-12||^@||7639812

and fread puts a linebreak at column 4. how to deal with that?

edit 1

notice that the standard console output does not show those characters:

➜  Downloads cat t2.csv 
47616412|76-398-12||||7639812
47616413|53-1696-18||||53169618

I only see them in vim:

47616412|76-398-12||^@||7639812
47616413|53-1696-18||^@||53169618

edit 2

notice further that read.csv with skipNul works:

> read.csv("t2.csv", sep="|",header=FALSE,skipNul=TRUE)
        V1         V2 V3 V4 V5       V6
1 47616412  76-398-12 NA NA NA  7639812
2 47616413 53-1696-18 NA NA NA 53169618

edit 3

here is the file! dropbox download

Florian Oswald
  • 5,054
  • 5
  • 30
  • 38

1 Answers1

4

This has just been fixed in dev 1.12.3 (see NEWS) :

  1. fread() now skips embedded NUL (\0), #3400. Thanks to Marcus Davy for reporting with examples, and Roy Storey for the initial PR.

I checked your file attached to the question indeed fails with 1.12.2 on CRAN but works in dev.

> library(data.table)   # v1.12.2 on CRAN 07 Apr 2019
> fread("~/Downloads/t2.csv")
Empty data.table (0 rows and 1 cols): 47616412|76-398-12||
Warning message:
In fread("~/Downloads/t2.csv") :
  Stopped early on line 2. Expected 1 fields but found 1. Consider fill=TRUE
  and comment.char=. First discarded non-empty line: <<>>

but in dev 1.12.3 it now works :

> library(data.table)   # v1.12.3 in development as of 17 Apr 2019
> fread("~/Downloads/t2.csv")
         V1         V2     V3     V4     V5       V6
      <int>     <char> <lgcl> <lgcl> <lgcl>    <int>
1: 47616412  76-398-12     NA     NA     NA  7639812
2: 47616413 53-1696-18     NA     NA     NA 53169618
>
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224