-1

Possible Duplicate:
UTF-8 all the way through

okay, this is stupid that I can't figure it out.

Mysql database is set to utf8_general_ci collation. The field i'm having problems with is longtext type.

characters added to the database as &eacute or other accented characters are returning as �.

I run the output through stripslashes and i've tried both with and without html_entity_decode but can find no change in the output. What am I doing wrong?

Cheers

Community
  • 1
  • 1
TH1981
  • 3,105
  • 7
  • 42
  • 78

2 Answers2

2

What character encoding does the string have that you try to insert? If it is in ISO-8859-1 you can use the PHP function utf8_encode() to encode it to UTF-8 before inserting it into the database.

http://php.net/manual/en/function.utf8-encode.php

Daniel Hedberg
  • 5,677
  • 4
  • 36
  • 61
1

Getting encoding right is really tricky - there are too many layers:

  • Browser
  • Page
  • PHP
  • MySQL

The SQL command "SET CHARSET utf8" from PHP will ensure that the client side (PHP) will get the data in utf8, no matter how they are stored in the database. Of course, they need to be stored correctly first.

DDL definition vs. real data

Encoding defined for a table/column doesn't really mean that the data are in that encoding. If you happened to have a table defined as utf8 but stored as differtent encoding, then MySQL will treat them as utf8 and you're in trouble. Which means you have to fix this first.

What to check

You need to check in what encoding the data flow at each layer.

  • Check HTTP headers, headers.
  • Check what's really sent in body of the request.
  • Don't forget that MySQL has encoding almost everywhere:
    • Database
    • Tables
    • Columns
    • Server as a whole
    • Client
      Make sure that there's the right one everywhere.

Conversion

If you receive data in e.g. windows-1250, and want to store in utf-8, then use this SQL before storing:

SET NAMES 'cp1250';

If you have data in DB as windows-1250 and want to retreive utf8, use:

SET CHARSET 'utf8';

Last note:

Don't rely on too "smart" tools to show the data. E.g. phpMyAdmin does (was doing when I was using it) encoding really bad. And it goes through all the layers so it's hard to find out. Also, Internet Explorer had really stupid behavior of "guessing" the encoding based on weird rules. Use simple editors where you can switch encoding. Also, I recommend MySQL Workbench.

Community
  • 1
  • 1
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
  • +1 and I'd add that to get encodings right every time you need to actually understand it. If one is just following memorized instructions based on a certain situation without understanding, then there will be encoding problems sooner or later. Unfortunately cargo cult programming isn't going anywhere. – Esailija Jan 09 '13 at 13:46