1

I have many large tables for which the starting cell may contain multiple quotes, like this:

test.txt-

"abc"" xyz""",123
mno,456

Now fread("test.txt",sep=",",header=F) throws up an error

Error in fread("123.txt", sep = ",", header = F) : 
  Unexpected character (" nxy) ending field 1 of line 1

It reads this test2.txt-

qwe,999
"abc"" nxyz""",123
mno,456

with fread("test2.txt",sep=",",header=F) properly though.

I need to make it run the other way though. Any solutions?

@Arun, I tried installing data.table v1.9.3 from github but got the following error, any ideas? Thanks.

> remove.packages("data.table")
Removing package from ‘C:/Users/sidpat/Documents/R/win-library/3.0’
(as ‘lib’ is unspecified)
> install_github("Rdatatable/data.table")
Installing github repo data.table/master from Rdatatable
Downloading master.zip from https://github.com/Rdatatable/data.table/archive/master.zip
Installing package from C:\Users\SIDPAT\AppData\Local\Temp\RtmpUBzt2K/master.zip
Installing data.table
"C:/PROGRA~1/R/R-30~1.3/bin/x64/R" --vanilla CMD build  \
  "C:\Users\sidpat\AppData\Local\Temp\RtmpUBzt2K\devtoolsc241bb5e2e\data.table-master"  \
  --no-manual --no-resave-data 

* checking for file 'C:\Users\sidpat\AppData\Local\Temp\RtmpUBzt2K\devtoolsc241bb5e2e\data.table-master/DESCRIPTION' ... OK
* preparing 'data.table':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
Warning: running command '"C:/PROGRA~1/R/R-30~1.3/bin/x64/Rcmd.exe" INSTALL -l "C:\Users\SIDPAT\AppData\Local\Temp\RtmpKyohpy\Rinst1ee42c4a1653" --no-multiarch "C:/Users/sidpat/AppData/Local/Temp/RtmpKyohpy/Rbuild1ee425057481/data.table"' had status 1
      -----------------------------------
* installing *source* package 'data.table' ...
** libs
Warning: running command 'make -f "Makevars" -f "C:/PROGRA~1/R/R-30~1.3/etc/x64/Makeconf" -f "C:/PROGRA~1/R/R-30~1.3/share/make/winshlib.mk" SHLIB="data.table.dll" WIN=64 TCLBIN=64 OBJECTS="assign.o bmerge.o chmatch.o dogroups.o fastmean.o fastradixdouble.o fastradixint.o fcast.o fmelt.o forder.o frank.o fread.o gsumm.o ijoin.o init.o rbindlist.o reorder.o uniqlist.o vecseq.o wrappers.o"' had status 127
ERROR: compilation failed for package 'data.table'
* removing 'C:/Users/SIDPAT/AppData/Local/Temp/RtmpKyohpy/Rinst1ee42c4a1653/data.table'
      -----------------------------------
ERROR: package installation failed
Error: Command failed (1)
sidpat
  • 735
  • 10
  • 26
  • I imagine it's upset with the inconsistent `sep` and terrible quoting – Rich Scriven Sep 02 '14 at 19:56
  • There are null entries too. what should the dimensions of your examples be? – Rich Scriven Sep 02 '14 at 20:09
  • More: It looks like you should transpose the data. I looked at it with `verbose = TRUE` and there's good advice in the output. – Rich Scriven Sep 02 '14 at 20:21
  • @RichardScriven, dimensions of my dataset is 100,000 rows and 10 columns. second i am receiving the data from an external source to which i have no control, so cant transpose before reading it in and can you explain that advice you mentioned above with verbose=TRUE – sidpat Sep 03 '14 at 05:58
  • @510947, could you try the development version 1.9.3 from [here](github.com/Rdatatable/data.table). It works for me. – Arun Sep 07 '14 at 20:40
  • @Arun i tried installing data.table v1.9.3 but got an error while installing it. edited the question above. – sidpat Sep 08 '14 at 15:28
  • @510947, please read the [README](https://github.com/Rdatatable/data.table/blob/master/README.md) file... There are some pointers regarding installation. – Arun Sep 08 '14 at 15:29
  • @Arun tried all of it, same error. Anyways i think it might be due to the fact that i am using my company's laptop(though i have permission to install packages). will try it on my pc later. – sidpat Sep 08 '14 at 15:33
  • Does it have Rtools installed? – Arun Sep 08 '14 at 15:46

1 Answers1

0

Did you try df <- read.csv("test.txt", sep=",", header=F, quote="")? You can then convert the resulting data frame into a data table dt <- data.table(df)

There's a discussion about fread and unbalanced quotes here: data.table::fread and Unbalanced "

I'm assuming test2.txt works because the field doesn't begin with a quote.

Community
  • 1
  • 1
  • I have a 100MB+ dataset and read.csv does work, but very slow; hence i want to use fread to read it. – sidpat Sep 03 '14 at 05:46