6

I know this have been asked several times, but to me is happening something strange:

I have an index view where rendering certain characters (letters with accent) causes Rails to raise the exception

incompatible character encodings: ASCII-8BIT and UTF-8

so i checked my strings encoding and this is actually ASCII-8BIT everywhere, even though i set the proper encoding to UTF-8 in my application.rb

config.encoding = "utf-8"

and in my enviroment.rb

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

and in my database it appear:

character_set_database = utf-8

as suggestend in some guides.

Strings are inserted with a textarea field and are not concatenated to any other already inserted string.

The strange things are:

  • this happens only in the index view, whereas this is not happening in the show (same resource)
  • this happens only for this model (which is an email, with subject and body, but this shouldn't affect anything)
  • In my development environment everything goes well setting str.force_encoding('utf-8'), whereas in my production environment this is not working. (dev i'm with Ruby 2.0.0, in production Ruby 2.1.0, both Rails4, and both MySql)
  • setting the file view with # encoding utf-8 also doesn't work
  • trying str.force_encoding('ascii-8bit').encode('utf-8') says Encoding::UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8 which is an à, while using body.force_encoding('ascii-8bit').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?'), replaces all accented charaters with a ?, while str.force_encoding('iso-8859-1').encode('utf-8') obviously generates the wrong character (a ?).

So my questions are 2: - why is rails setting the string encodint to ascii-8bit? - how to solve this issue?

I've already checked these questions (the newest ones with rails4):

Rails View Encoding Issues

"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8

How to convert a string to UTF8 in Ruby

Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8

and other resources also, but nothing worked.

Community
  • 1
  • 1
sissy
  • 2,908
  • 2
  • 28
  • 54
  • Are you entering the accented characters into the view using a text editor? – Slicedpan Mar 07 '14 at 12:33
  • it's a textarea_field – sissy Mar 07 '14 at 15:03
  • Is the source code files all utf-8 or are your text editor saving the files in ascii-8bit perhaps. – Roger Nordqvist Mar 07 '14 at 20:55
  • these are not string generated by my text editor...anyway files are in UTF-8 – sissy Mar 08 '14 at 10:38
  • I'm struggling with a similar issue (getting `incompatible character encodings: ASCII-8BIT and UTF-8` errors from user entered data). Were you able to solve your problem? Did you find a way to replicate it in a test? – Nick Feb 07 '17 at 05:50
  • I was not able to definitely solve the problem, sorry. And it happens only from time to time without an apparent reason. – sissy Feb 07 '17 at 09:55
  • I found a workaround that I posted [here](http://stackoverflow.com/questions/42078430/getting-incompatible-character-encodings-utf-8-and-ascii-8bit-error-when-disp?noredirect=1#comment71334714_42078430). It basically forces UTF-8 encoding on every read_attribute (if it's a string). Maybe it can help. Good luck – Nick Feb 07 '17 at 21:03
  • Well as written in my initial post, your workaround is basically as substituting str.force_encoding everywhere, and that works for me in dev environment, whereas it fails in production. No way :( – sissy Feb 08 '17 at 14:42

1 Answers1

1

You probably have a string literal in your source code somewhere that you then concatenate another string too. For instance:

some_string = "this is a string"

or even

some_string = "" #empty string

Those strings, stored in some_string, will be marked ASCII_8BIT, and if you then later do something like:

some_string = some_string + unicode_string

Then you'll get the error. That is, those strings will be marked ASCII-8BIT unless you add, to the top of the file where the string literals are created:

#encoding: utf-8

That declaration determines the default encoding that string literals in source code will have.

I am just guessing, because this pattern is a common source of this problem. To know more for sure, it would take more information than is in your question -- it would take debugging the actual source code, to figure out exactly what string is tagged as ASCII-8BIT when you expect it to be tagged UTF-8 instead, and exactly where that String came from.

jrochkind
  • 22,799
  • 12
  • 59
  • 74