I'm importing a CSV file into Ruby (1.8.7). File.open('path/to/file.csv').read returns this in the console:
Stefan,Engstr\232m
The encoding is identified as iso-8859-2 by UniversalDetector (chardet gem).
UniversalDetector::chardet("Stefan,Engstr\232m")
=> {"confidence"=>0.626936305574385, "encoding"=>"ISO-8859-2"}
Trying to convert the string yields the following:
Iconv.conv("UTF-8", "ISO-8859-2", "Stefan,Engstr\232m")
=> "Stefan,Engstrm"
whereas I would expect:
=> "Stefan,Engström"
- Could the string really be in some other encoding?
- I haven't seen the \232 syntax before, usually when strings are strangely encoded some weird character will show up instead, e.g. � or some chinese.
Let me know if I should provide more information or elaborate on something.