44

I keep getting an Encoding::UndefinedConversionError - "\xC2" from ASCII-8BIT to UTF-8 every time I try to convert a hash into a JSON string. I tried with [.encode | .force_encoding](["UTF-8" | "ASCII-8BIT" ]), chaining .encode with .force_encoding, backwards, switching parameters but nothing seemed to work so I caught the error like this:

begin
  menu.to_json
rescue Encoding::UndefinedConversionError
  puts $!.error_char.dump
  p $!.error_char.encoding
end

Where menu is a sequel's dataset.to_hash with content from a MySQL DB, utf8_general_ci encoding and returned this:

"\xC2"

<#Encoding:ASCII-8BIT>

The encoding never changes, no matter what .encode/.force_encoding I use. I've even tried to replace the string .gsub!(/\\\xC2/) without luck.

Any ideas?

Community
  • 1
  • 1
martriay
  • 5,632
  • 3
  • 29
  • 39
  • 3
    1.Did you try this? `menu.force_encoding("ISO-8859-1").encode("UTF-8")` 2. add a "# encoding 'utf-8'` string at the top of all your .rb files. 3. Check your environment settings. what does `$ echo LC_CTYPE` in your terminal say? – Kashyap Oct 22 '12 at 09:15
  • Did step 1 fail with an error? Did step 2 work? For step 3, http://thegreyblog.blogspot.in/2012/02/fixing-mac-os-x-lions-ssh-utf-8-issues.html this link has the env settings that your program must run with incase you want to avoid the issue. – Kashyap Oct 29 '12 at 10:51

5 Answers5

93
menu.to_s.encode('UTF-8', invalid: :replace, undef: :replace, replace: '?')

This worked perfectly, I had to replace some extra characters but there are no more errors.

Dorian
  • 22,759
  • 8
  • 120
  • 116
martriay
  • 5,632
  • 3
  • 29
  • 39
  • 2
    Fantastic solution - solved my problem dealing with strange types in SQL Server. Thank you! – Marc Clifton Oct 27 '13 at 12:48
  • Thanks! It works for me too, official ruby doc for future reference [here](http://www.ruby-doc.org/core-2.1.2/String.html#method-i-encode) – jmoreira Jun 16 '14 at 18:45
21

What do you expect for "\xC2"? Probably a Â

With ASCII-8BIT you have binary data, and ruby cant decide, what should be.

You must first set the encoding with force_encoding.

You may try the following code:

Encoding.list.each{|enc|
  begin
    print "%-10s\t" % [enc]
    print "\t\xC2".force_encoding(enc)
    print "\t\xC2".force_encoding(enc).encode('utf-8')
  rescue => err
    print "\t#{err}"
  end
  print "\n"
}

The result are the possible values in different encodings for your "\xC2".

The result may depend on your Output format, but I think you can make a good guess, which encoding you have.

When you defined the encoding you need (probably cp1251) you can

menu.force_encoding('cp1252').to_json

See also Kashyaps comment.

knut
  • 27,320
  • 6
  • 84
  • 112
  • 1
    this is what I did: ´Encoding.list.each{|enc| begin print "%-10s\t" % [enc] print menu.to_json.force_encoding(enc) print menu.to_json.force_encoding(enc).encode('utf-8') rescue => err print "\t#{err}" end print "\n" }´ and this is what I've got for each result: ´SJIS-KDDI "\xC2" from ASCII-8BIT to UTF-8´ – martriay Oct 28 '12 at 22:55
  • This discovery loop for all encodings is brilliant. Helped me solve my own variant on this problem. Thanks! – pdobb Dec 14 '18 at 05:20
12

If you don't care about losing the strange characters, you can blow them away:

str.force_encoding("ASCII-8BIT").encode('UTF-8', undef: :replace, replace: '')
Ponny
  • 703
  • 9
  • 17
  • 1
    Didn't worked :( Encoding::UndefinedConversionError at /menu "\xC2" from ASCII-8BIT to UTF-8 – martriay Jan 03 '13 at 05:15
  • 4
    menu.to_s.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replace => '?'}) -> this worked! :D – martriay Jan 03 '13 at 06:16
10

Your auto-accepted solution doesn't work, there are effectively no errors, but it is NOT JSON.

I solved the problem using the oj gem, it now works find. It is also faster than the standard JSON library.

Writting :

   menu_json = Oj.dump menu

Reading :

   menu2 = Oj.load menu_json

https://github.com/ohler55/oj for more details. I hope it will help.

gvo
  • 845
  • 1
  • 14
  • 22
  • The problem was the error, not the JSON part. Therefore, my auto-accepted answers works. Anyway, I'll upvote you for giving an alternative solution. – martriay Sep 22 '13 at 18:35
  • Well, I agree with you, there are no longer errors, but it's not a json string. I don't know what was your purpose, but I needed to load back my json, and I wanted a valid JSON String. Or maybe I have missed something in your proposed solution? – gvo Sep 24 '13 at 10:47
  • This question was only about the error, I'm not saying my answer is the best choice, clearly it isn't for your purpose, but solves the problem presented: the encoding error. The JSON I mention in my question is for contextualization purposes. – martriay Sep 24 '13 at 22:59
  • Thanks @gvo ! Hopefully others googling for this error will find this solution... – Cyberwiz Dec 07 '17 at 11:49
1

:fallback option can be useful if you know what chars you want to replace

"Text ".encode("ASCII", "UTF-8", fallback: {"" => ":)"})
#=> hello :)

From docs:

Sets the replacement string by the given object for undefined character. The object should be a Hash, a Proc, a Method, or an object which has [] method. Its key is an undefined character encoded in the source encoding of current transcoder. Its value can be any encoding until it can be converted into the destination encoding of the transcoder.