1

I have this similar problem: read.csv warning 'EOF within quoted string' prevents complete reading of file

That is, when I load a csv R says:

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
EOF within quoted string

I can get rid of this error by applying: quotes="" to read.csv

But the main problem still exists, only 22111 rows of 689233 in total are read into R. I would like to try removing all special characters from the csv to see if this clears the problem.

Related I found this: How to remove specific special characters in R

But is there a way to do it in read.csv, that is in the phase when I'm reading in the file?

Community
  • 1
  • 1
ElinaJ
  • 791
  • 1
  • 6
  • 18
  • Are you certain that your input file is well-formed, meaning that all 689,233 rows have the same number of columns? `read.csv` (which is a wrapper around `read.table`) is somewhat sensitive and can die for bad input files. – Tim Biegeleisen Jul 10 '15 at 01:48
  • 1
    I don't think you can do it within read.csv! I believe it is even better here to not use R and use something like `awk` or other Linux text post-processing commands. – agstudy Jul 10 '15 at 01:49
  • 2
    @ElinaJ Could you post the first 2 rows along with rows 22111 and 22112 from your input csv file? – Tim Biegeleisen Jul 10 '15 at 01:51
  • I'm afraid it's sensitive data and it's not possible to post... I tried deleting rows 21611-22111 and now I got 230,168 rows to load... – ElinaJ Jul 10 '15 at 02:24
  • You can likely solve it by using read.table with option `encoding`. – daniel Jul 10 '15 at 02:24
  • What should I set the encoding to? – ElinaJ Jul 10 '15 at 02:26

3 Answers3

1

Did you try fread from data.table? It can optimize the task and likely deal with some common issues. As you haven't provide any piece of data, I'm giving a silly example:

> fread('col1,col2\n5,"4\n3"')
   col1 col2
1:    5 4\n3
daniel
  • 1,186
  • 2
  • 12
  • 21
0

It was indeed a special charcter. There was a → (arrow, hexadecimal value 0x1A) on line 22,112. After deleting the arrow I get the data to load normally!

ElinaJ
  • 791
  • 1
  • 6
  • 18
0

Solution of datatable expord csv with special chahracters Find charset from https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.js or https://cdn.datatables.net/buttons/1.1.2/js/buttons.html5.min.js

and change it to 'UTF-8-BOM'from 'UTF-8'

  • Links to potential solutions are always welcome, but please add some details for future visitors in case the link is no longer available. – Nikolay Mihaylov Aug 30 '16 at 10:43