3

I'm getting a Illegal quoting in line 1. (CSV::MalformedCSVError) when I try to read a CSV that I download using Selenium WebDriver.

CSV.foreach( "foo.csv" ) do |row|
  # anger :(
end

but when I copy the contents and paste it into a new file and save it again, it works just fine:

CSV.foreach( "bar.csv" ) do |row|
  # works fine
end

Here's the first 5 lines of the CSV in question, in case it helps...

"Name","W","L","ERA","GS","G","SV","IP","H","ER","HR","SO","BB","WHIP","K/9","BB/9","FIP","WAR","playerid"
"Craig Kimbrel","5","1","1.79","0","65","35","65.0","42","13","4","95","19","0.95","13.16","2.65","1.84","1.7","6655"
"Aroldis Chapman","2","1","1.93","0","30","27","30.0","18","6","2","47","12","0.99","14.24","3.56","2.22","0.6","10233"
"Greg Holland","5","2","2.39","0","65","34","65.0","47","17","5","83","21","1.05","11.53","2.95","2.48","1.3","7196"
"Kenley Jansen","5","2","2.16","0","65","32","65.0","46","16","6","86","19","1.00","11.97","2.64","2.51","0.9","3096"

I haven't been able to find or come up with a way to get my raw, selenium-downloaded CSV to be read correctly. Anyone run into this or have any ideas on what could be wrong with my data, or how I can fix this programmatically?

Thank you!

sway
  • 363
  • 1
  • 5
  • 17
  • Are you sure you don't just need to [escape the double quotes](http://stackoverflow.com/questions/17808511/properly-escape-a-double-quote-in-csv)? – mralexlau Mar 22 '14 at 00:42
  • Any stray non-visible bytes in the CSV? You can `cat -vet foo.csv | head` to have a quick look. – mu is too short Mar 22 '14 at 01:07
  • It's just a wild guess, but I would say that the file you're trying to download via Selenium WebDriver is malformed due to a quoting issue on the first line. – Mark Thomas Mar 22 '14 at 01:21
  • 1
    I don't *see* any illegal quoting in the file you posted, but that doesn't mean that it's not there: it might be a non-printable character or a Unicode Byte-Order-Mark that's causing the problem. That would also explain why copy&paste "fixes" the problem. – Jörg W Mittag Mar 22 '14 at 13:28

1 Answers1

6

It's very likely that your file has a byte-order mark U+FEFF at the very beginning. You are probably losing it when you copy and paste again.

The proper solution is:

CSV.foreach("foo.csv", "r:bom|utf-8") { ... }
djanowski
  • 5,610
  • 1
  • 27
  • 17