5

I just randomly got this strange error via Rails 3, on heroku (postgres)

PGError: ERROR: invalid byte sequence for encoding "UTF8": 0x85 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". : INSERT INTO "comments" ("content") VALUES ('BTW∑I re-listened to the video' ......

The hint while nice isn't making anything click for me. Can I set encoding somewhere? Should I even mess with that? Anyone seen this and/or have any ideas on how to deal with this type of issue?

Thank you

AnApprentice
  • 108,152
  • 195
  • 629
  • 1,012

2 Answers2

6

From what I can gather, this is a problem where the string you're trying to insert into your PostgrSQL server isn't encoded with UTF-8. This is somewhat odd, because your Rails app should be configured to use UTF-8 by default.

There are a couple of ways you can try fix this (in order of what I recommend):

  • Firstly, make sure that config.encoding is set to "utf-8" in config/application.rb.

  • If you're using Ruby 1.9, you can try to force the character encoding prior to insertion with toutf8.

  • You can figure out what your string is encoded with, and manually set SET CLIENT_ENCODING TO 'ISO-8859-1'; (or whatever the encoding is) on your PostgeSQL connection before inserting the string. Don't forget to do RESET CLIENT_ENCODING; after the statement to reset the encoding.

  • If you're using Ruby 1.8 (which is more likely), you can use the iconv library to convert the string to UTF-8. See documentation here.

  • A more hackish solution is to override your getters and setters in the model (i.e. content and content=) encode and decode your string with Base64. It'd look something like this:

 

require 'base64'

class Comment
  def content
    Base64::decode64(self[:content])
  end

  def content=(value)
    self[:content] = Base64::encode64(value)
  end
end
vonconrad
  • 25,227
  • 7
  • 68
  • 69
  • Thank you vonconrad, you're a life saver. I do have config.ecoding set correctly. Though maybe Heroku is changing something on deployment. If I go with the iconv solution, which seems like the smartest choice from your comment, any seen issues when moving up to Ruby 1.9? Also the doc is pretty empty for iconv and I'm a newbie, any way to see an example? thanks! – AnApprentice Jan 20 '11 at 03:02
  • 3
    I believe `iconv` should work for both 1.8 and 1.9. As for the code, something like this should work: `content = ::Iconv.conv('UTF-8//IGNORE', 'UTF-8', content + ' ')[0..-2]`. Basically, this forces the encoding to be UTF-8, no matter what it was originally. I got the code from here: http://stackoverflow.com/questions/4583924/string-force-encoding-in-ruby-1-8-7-or-rails-2-x/4585362#4585362 – vonconrad Jan 20 '11 at 03:19
0

text.force_encoding(charset).encode("UTF-8")

http://blog.zenlike.me/2013/04/06/sendgrid-parse-incoming-email-encoding-errors-for-rails-apps-using-postgresql/

Renars Sirotins
  • 176
  • 3
  • 5