4

When I tried to read a csv file using data.table:fread(fn, sep='\t', header=T), it gives an "Unbalanced " observed on this line" error. The data has 3 integer variables and 1 string variable. The strings in the csv file are not enclosed with ", and yes there are some lines that contains " within the string variable and the " characters are not in pairs.

I am wondering is it possible to let fread just ignore the unpaired " in the variable and continue reading data? Thanks.

Here is the sample data(just one record)

N_ID    VISIT_DATE  REQ_URL REQType
175931  2013-3-8 23:40:30   http://aaa.com/rest/api2.do?api=getSetMobileSession&data={"imei":"60893ZTE-CN13cd","appkey":"android_client","content":"Z0JiRA0qPFtWM3BYVltmcx5MWF9ZS0YLdW1ydXoqPycuJS8idXdlY3R0TGBtU   1
baidao
  • 493
  • 3
  • 10
  • 2
    Can you please add the first lines of your file to the question? Note that fread is still under development and embedded quotes ("\"" and """") have problems... – agstudy Apr 18 '13 at 22:24
  • without reproducing your error there's little we can help with (unless one has experienced the exact problem you're facing). – Arun Apr 18 '13 at 22:42
  • I have added the sample record. Please verify. Thanks – baidao Apr 19 '13 at 10:05

1 Answers1

6

UPDATE: Now implemented in v1.8.11

From NEWS :

fread now accepts quotes (both ' and ") in the middle of fields, whether the field starts with " or not, rather than the 'unbalanced quotes' error, #2694. Thanks to baidao for reporting. It was known and documented at the top of ?fread (text now removed). If a field starts with " it must end with " (necessary if the field separator itself is in the field contents). Embedded quotes can be in column names too. Newlines (\n) still can't be in quoted fields or quoted column names, yet.


Yes as @agstudy said, embedded quotes are a known documented problem not yet implemented since fread is new. Strictly speaking, I suppose these ones aren't embedded because the string in your example doesn't start with a quote, though.

Anyway, I've filed this as a bug report so it doesn't get forgotten. To be done in the next release. Thanks for highlighting.

#2694 : Strings including quotes but not starting with quote in fread

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • Has this been fixed? I'm having a similar issue processing tweets, I believe the tweet_text fields have \n characters that should be ignored. – ZacharyST Feb 05 '16 at 18:27
  • @ZacharyST Did you search README and did you test? If still a problem please find and +1 (or raise a new) GitHub issue. – Matt Dowle Feb 05 '16 at 20:00