3

I have a problem with fread() reading a column of directory paths using "\" as the directory separator. The issue is that the trailing directory separator throws an error in fread().

For the below example csv file,

file,size
"windows\user",123

both fread() and read.csv() agree and both convert the \ to \\

> fread("example.csv")
            file size
1: windows\\user  123

However, for the following example fread() gives an error while read.csv() is fine.

file,size
"windows\user\",123

read.csv() gives

> read.csv("example.csv")
             file size
1 windows\\user\\  123

While the fread() error looks like this

> fread("example.csv",verbose=TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000 GB
File is opened and mapped ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 2 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 2 columns
First row with 2 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 2
Subtracted 1 for last eol and any trailing empty lines, leaving 1 data rows
Error in fread("example.csv", verbose = TRUE) : 
' ends field 1 on line 1 when detecting types: "windows\user\",123

I would really like to avoid doing

DT = data.table(read.csv("example.csv"))

if at all possible.

  • 4
    As it happens, I've just been fixing that along with `\n` inside quoted fields. Will add answer when it's ready to try from [GitHub](https://github.com/Rdatatable/data.table/). – Matt Dowle Jun 23 '14 at 23:18
  • It does make one wonder what would be the "right" fix since it is due to the well-documented behavior of `scan` and contrary to this questioner's claims, the example is NOT fine with read.csv(). 'file,size "windows\user\",123` throws an error. – IRTFM Jun 23 '14 at 23:31
  • @BondedDust `read.csv` seems to read it fine for me, agreeing with the asker. I looked in `?scan` - where do you mean? – Matt Dowle Jun 24 '14 at 00:50
  • `scan` interprets the '\user' as ctrl-u followed by 'ser'. `read.csv(text="windows\user\",123", sep=",")` returns: Error: '\u' used without hex digits in character string starting ""windows\u". Mac 10.8.5, R 3.1.0 – IRTFM Jun 24 '14 at 01:01
  • @BondedDust That's not `read.csv`, that's the parser. Try typing `"windows\user\",123"` at the console on its own and you get the same error. To parse you need to double the \. When reading from a file with contents as shown by asker, `read.csv(filename)` works. – Matt Dowle Jun 24 '14 at 01:14
  • Right. I erred by assuming that `read.csv(text=.)` would fully duplicate disk access. – IRTFM Jun 24 '14 at 01:23

1 Answers1

5

Now fixed in v1.9.3 on GitHub.

  • fread() now accepts trailing backslash in quoted fields. Thanks to user2970844 for highlighting.
$ cat example.csv
file,size
"windows\user\",123

> require(data.table)
> fread("example.csv")
              file size
1: windows\\user\\  123
> read.csv("example.csv")
             file size
1 windows\\user\\  123
> 
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 1
    That's two fixes I've seen in the last couple of days. It's really nice to see things get done that fast. Thank you. – Rich Scriven Jun 24 '14 at 03:09
  • I just compiled the github version and it seems to work for my actual data, but for my toy example fread gives me an empty data.table. I'm sure this is just a corner case, but try `fread("file,size\n\"windows\\user\\\",123\n")` – Jacob Colvin Jun 24 '14 at 17:55
  • @user2970844 I included your example in the test suite (tests 1337 & 1338) so should be ok. Could you run `test.data.table()` and let me know the last line please. I pasted that command as well and works for me. I'm at `"All 1341.3 tests ... completed ok"` – Matt Dowle Jun 24 '14 at 18:03
  • Test 1010.1, 1011 ran without errors but failed check that x equals y: – Jacob Colvin Jun 24 '14 at 18:20
  • @JacobColvin Thanks but what's the last line (how many tests were run)? That's the easiest way to establish which commit you have. – Matt Dowle Jun 24 '14 at 18:22
  • I"m running windows 8 fyi,... Test 1010.1, 1011, 1328, 1332, 1335, 1337, 1338, 1341.1, 1341.2, 1341.3 ran without errors but failed check that x equals y: Test 1324, 1333 didn't produce correct error : Test 1336 Error in fread(f) : Expected sep (',') but new line, EOF (or other non printing character) ends field 5 on line 3 when detecting types: "TX",77406,"business analyst\\\\\\\","the boeing co","","" – Jacob Colvin Jun 24 '14 at 18:27
  • Error in eval(expr, envir, enclos) : 13 errors out of 1341.3 in inst/tests/tests.Rraw on Tue Jun 24 11:17:30 2014. Search tests.Rraw for test numbers: 1010.1, 1011, 1324, 1328, 1332, 1333, 1335, 1336, 1337, 1338, 1341.1, 1341.2, 1341.3. The test() function is defined at the top of tests.Rraw and contains usage info. – Jacob Colvin Jun 24 '14 at 18:28
  • Thanks. Could you copy the full output somewhere like www.copy.com, or email it to me at `maintainer("data.table")`. I ran on winbuilder and only 1010.1 and 1011 fail (looks like a \r issue). So maybe it is Win8. Also your `sessionInfo()` please. – Matt Dowle Jun 24 '14 at 18:35
  • 1
    Sorry... I reinstalled from github and noticed a warning about a dll not being copied. I think it was because I had an existing R session already opened. Now everything is working as expected, and I only get the two errors (1010.1, 1011) you previously mentioned from winbuilder. Thank you very much @MattDowle. – Jacob Colvin Jun 24 '14 at 19:04
  • @JacobColvin That's a relief! Thanks for the update. – Matt Dowle Jun 24 '14 at 19:09