Rails encoding in ASCII-8BIT

Question

I know this have been asked several times, but to me is happening something strange:

I have an index view where rendering certain characters (letters with accent) causes Rails to raise the exception

incompatible character encodings: ASCII-8BIT and UTF-8

so i checked my strings encoding and this is actually ASCII-8BIT everywhere, even though i set the proper encoding to UTF-8 in my application.rb

config.encoding = "utf-8"

and in my enviroment.rb

Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

and in my database it appear:

character_set_database = utf-8

as suggestend in some guides.

Strings are inserted with a textarea field and are not concatenated to any other already inserted string.

The strange things are:

this happens only in the index view, whereas this is not happening in the show (same resource)
this happens only for this model (which is an email, with subject and body, but this shouldn't affect anything)
In my development environment everything goes well setting str.force_encoding('utf-8'), whereas in my production environment this is not working. (dev i'm with Ruby 2.0.0, in production Ruby 2.1.0, both Rails4, and both MySql)
setting the file view with # encoding utf-8 also doesn't work
trying str.force_encoding('ascii-8bit').encode('utf-8') says Encoding::UndefinedConversionError "\xC3" from ASCII-8BIT to UTF-8 which is an à, while using body.force_encoding('ascii-8bit').encode('UTF-8', :invalid => :replace, :undef => :replace, :replace => '?'), replaces all accented charaters with a ?, while str.force_encoding('iso-8859-1').encode('utf-8') obviously generates the wrong character (a ?).

So my questions are 2: - why is rails setting the string encodint to ascii-8bit? - how to solve this issue?

I've already checked these questions (the newest ones with rails4):

Rails View Encoding Issues

"\xC2" to UTF-8 in conversion from ASCII-8BIT to UTF-8

How to convert a string to UTF8 in Ruby

Encoding::UndefinedConversionError: "\xE4" from ASCII-8BIT to UTF-8

and other resources also, but nothing worked.

Are you entering the accented characters into the view using a text editor? — Slicedpan, Mar 07 '14 at 12:33
Is the source code files all utf-8 or are your text editor saving the files in ascii-8bit perhaps. — Roger Nordqvist, Mar 07 '14 at 20:55
these are not string generated by my text editor...anyway files are in UTF-8 — sissy, Mar 08 '14 at 10:38
I'm struggling with a similar issue (getting `incompatible character encodings: ASCII-8BIT and UTF-8` errors from user entered data). Were you able to solve your problem? Did you find a way to replicate it in a test? — Nick, Feb 07 '17 at 05:50
I was not able to definitely solve the problem, sorry. And it happens only from time to time without an apparent reason. — sissy, Feb 07 '17 at 09:55
I found a workaround that I posted [here](http://stackoverflow.com/questions/42078430/getting-incompatible-character-encodings-utf-8-and-ascii-8bit-error-when-disp?noredirect=1#comment71334714_42078430). It basically forces UTF-8 encoding on every read_attribute (if it's a string). Maybe it can help. Good luck — Nick, Feb 07 '17 at 21:03
Well as written in my initial post, your workaround is basically as substituting str.force_encoding everywhere, and that works for me in dev environment, whereas it fails in production. No way :( — sissy, Feb 08 '17 at 14:42

score 1 · Answer 1 · answered Sep 30 '14 at 20:46

You probably have a string literal in your source code somewhere that you then concatenate another string too. For instance:

some_string = "this is a string"

or even

some_string = "" #empty string

Those strings, stored in some_string, will be marked ASCII_8BIT, and if you then later do something like:

some_string = some_string + unicode_string

Then you'll get the error. That is, those strings will be marked ASCII-8BIT unless you add, to the top of the file where the string literals are created:

#encoding: utf-8

That declaration determines the default encoding that string literals in source code will have.

I am just guessing, because this pattern is a common source of this problem. To know more for sure, it would take more information than is in your question -- it would take debugging the actual source code, to figure out exactly what string is tagged as ASCII-8BIT when you expect it to be tagged UTF-8 instead, and exactly where that String came from.

Rails encoding in ASCII-8BIT

1 Answers1