38

I am having difficulty getting R to read a .txt or .csv file that contains apostrophes.

Some of my columns contain descriptive text, such as "Attends to customers' needs" or "Sheriff's deputy". My file opens correctly in Excel (that is, all the data appear in the correct cells; there are 3 columns and about 8000 rows, and there is no missing data). But when I ask R to read the file, this is what happens:

data <-read.table("datafile.csv", sep=",", header=TRUE)
  Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 520 did not have 3 elements

(Line 520 is the first line that contains an apostrophe.)

If I go into the .txt or .csv file and manually remove all the apostrophes, then R reads the file correctly. However, I'd rather keep the apostrophes if I can.

I am new to R and would be grateful for any help.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
user1257313
  • 1,057
  • 4
  • 11
  • 10
  • I'm upvoting 'cause even though I basically knew this, I once got "gotcha'd" when reading in a csv file generated in a data acquisition machine. The problem was that, inside a rather large header block, the file had some fields w/ apostophes (an unexpected occurrence). Sometimes you have to take a careful look at the crapola in the source file. – Carl Witthoft Mar 08 '12 at 18:46

3 Answers3

39

By default, read.table sees single and double quotes as quoting characters. You need to add quote="\"" to your read.table call. Or, you could just use read.csv, which only sees double quotes as quoting characters by default.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 1
    Thanks for your help. Interestingly, I couldn't get any of the options you or DWin suggested for read.table to work... but read.csv does the trick! – user1257313 Mar 09 '12 at 19:25
  • The other difference with `read.csv` is that its default set is `fill = TRUE` – IRTFM Feb 12 '15 at 03:00
9

Thoroughly studying the options in ?read.table will pay off in the long run. The default values for quoting characters is quote = "\"'", which is really only two characters after R parses that expression, single-quote and double-quote. You can remove them both from consideration using quotes=NA. It's sometimes necessary to also remove the 'comment.char' defaulting to "#", and it may be helpful to change 'as.is' to TRUE to prevent strings from getting converted to factors.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 5
    I believe that should now be `quote=NULL` not `quotes=NA` as of the most recent version. – JayCo Feb 12 '15 at 17:26
  • I had the same issue, but was trying to import a list that had quotation marks that had to remain in the imported list. Using quotes=NULL worked for me (as did the answer below, quotes="\\". I appreciate actual answers instead of "read the help manual and hopefully you can find it," so thanks. In my case I used > variable <- read.table("datafile.txt",quote=NULL) and the quotes came through nicely – jeramy townsley Feb 19 '15 at 05:17
1

Setting the parameter quote="\\" in read.table should do the trick.