How can I use encode utf-8 in Ruby?

Question

I am trying to extract a word from a first line of file:

LOCATION,Feij�,AC,a,b,c

this way:

2.0.0-p247 :005 > File.foreach(file).first

=> "LOCATION,Feij\xF3,AC,a,b,c\r\n"`

but when I try to use split:

2.0.0-p247 :008 > File.foreach(file).first.split(",")

ArgumentError: invalid byte sequence in UTF-8 from (irb):8:in split' from (irb):8 from /home/bleh/.rvm/rubies/ruby-2.0.0-p247/bin/irb:13:in'

What I expected is: Feijó

I already try a lot of combinations like .encode and .force_encoding.

Some ideas?

Can you try `File.foreach(file, :encoding => 'utf-8').first` ? — Arup Rakshit, Feb 15 '14 at 20:24
Hi Arup. Yeah, I received Feij\xF3. When try to split, same error — coffee, Feb 15 '14 at 20:28
Okay, try now `File.foreach(file, :encoding => 'ascii-8:utf-8').first` ? — Arup Rakshit, Feb 15 '14 at 20:30
I think I got it. `.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)` — coffee, Feb 15 '14 at 20:34
It will do something else, not the one you are trying to do. :-) — Arup Rakshit, Feb 15 '14 at 20:35
Try this: File.foreach(file, :encoding => 'windows-1252:utf-8').first it's looking like it's encoded at latin1 supplemental — rainkinz, Feb 15 '14 at 20:43
Read this [answer](http://stackoverflow.com/a/12586731/2767755). There author explains well. — Arup Rakshit, Feb 15 '14 at 20:43
@rainkinz Yes, if the source file encoding is known, these types of issue can be handled easily. — Arup Rakshit, Feb 15 '14 at 20:44
Yeah, I'm just guessing it from his output above: "LOCATION,Feij\xF3,AC,a,b,c\r\n"` \xF3 is ó in latin supplemental — rainkinz, Feb 15 '14 at 20:47

matt · Answer 1 · 2014-09-15T20:39:37.473

3

The character ó is \xF3 in the ISO-8859-1 encoding, so this is probably the encoding of the file (it could also be CP-1252.

You can specify the encoding as an arg to File::foreach, and you can also ask Ruby to re-encode it to UTF-8 for you:

File.foreach(file, :encoding => 'iso-8859-1:utf-8').first.split(",")

edited Sep 15 '14 at 20:39

answered Feb 15 '14 at 20:40

matt

1 Answers1