18

Had a developer write this method and its causing a Encoding::UndefinedConversionError ("\xE2" from ASCII-8BIT to UTF-8): error.

This error only happens randomly so the data going in is original DB field is what is causing the issue. But since I don't have any control over that, what can I put in the below method to fix this so bad data doesn't cause any issues?

def scrub_string(input, line_break = ' ')
  begin
     input.an_address.delete("^\u{0000}-\u{007F}").gsub("\n", line_break)
  rescue
     input || ''
  end
end

Will this work?

 input = input.encode('utf-8', :invalid => :replace, :undef => :replace, :replace => '_')
jdog
  • 10,351
  • 29
  • 90
  • 165

2 Answers2

2

Yeah this should work, it'll replace any weird characters that can't be converted into UTF-8 with an underscore.

Read more about encoding strings in ruby here:

http://ruby-doc.org/core-1.9.3/String.html#method-i-encode

SickLickWill
  • 196
  • 1
  • 6
0

Using the force_encoding("UTF-8") method on the string worked for me.

Example

This example uses random data from a whois request (you can try for yourself).

This errors for me:

# gem install whois

whois = Whois::Client.new
mystring = whois.lookup("google.com").to_s
puts mystring

# (irb):38:in `write': "\xE2" from ASCII-8BIT to UTF-8 
# (Encoding::UndefinedConversionError)

But this works!

whois = Whois::Client.new
mystring = whois.lookup("google.com").to_s
puts mystring.force_encoding("UTF-8")

The key difference is calling force_encoding("UTF-8") on the string before printing it.

It's from here.

stevec
  • 41,291
  • 27
  • 223
  • 311