0

I recently deployed my application. For development I used SQLite and everything was right so far. I have a controller which uses Nokogiri to populate data into my database.

The problem is on production I'm using MySQL instead of SQLite and now my script is populating the data with the wrong encoding.

For instance, it writes "Aragón" instead of "Aragón". The MySQL is using utf8 for both the database and every table.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
jävi
  • 4,571
  • 1
  • 24
  • 32
  • After some debugging now I am sure problem is not in the DB, nokogiri is reading wrong characters, however this happens just in the production server. – jävi May 19 '11 at 19:20

2 Answers2

0

Nokogiri is probably returning things correctly. I suspect you have a mismatch in the character set of the content you are parsing with Nokogiri, and the database.

Your data being parsed might be ISO-8859-1 or WIN-1252, which are the most common on the internet. You'll need to look in the data to see what it is declared as. Also look at the source for the word "Aragón" and see whether it has embedded upper-bit characters, or entity-encoded characters. By looking at the value for the accented characters you can also get an idea when encoding the characters are.

Odds are good they're not UTF8, so when Nokogiri passes them to your code that writes to the database they will be wrong.

To fix the problem you'll need to either tell Nokogiri what the encoding is, or convert the text to UTF-8 before storing it.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0

You've got the encoding wrong somewhere in your stack. I bet it's set wrong in MySQL.

Take a look at this: I need help fixing Broken UTF8 encoding

Community
  • 1
  • 1
Eli
  • 5,500
  • 1
  • 29
  • 27