2

I have a webapp that stores French text -- which potentially includes accented characters -- in a MySQL database. When data is retrieved directly through PHP, accented characters become gibbirish. For instance: qui r�fl�te la liste.

Hence, I use htmlentities() (or htmlspecialchars() ) to convert the string to html entities, and all is fine. However, when I come to output data that contains both accented characters and HTML elements, things get more complicated. For instance, <strong> is converted to &lt;strong&gt; and therefore not understood by the browser.

How can I simultaneously get accented characters displayed correctly and my HTML parsed correctly?

Thank you!

David Chouinard
  • 6,466
  • 8
  • 43
  • 61

3 Answers3

10

Maybe you could take a look to utf8_encode() and utf8_decode()

OcuS
  • 5,320
  • 3
  • 36
  • 45
  • We had to do this when we encountered Polish characters in our SQL database, hopefully there is something similar for MySQL. – Daniel Jan 12 '10 at 22:43
  • 2
    -1 This may work, but it may not work because of why you think it works and you're not fixing the root cause of the encoding problem. Please read [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) for what `utf8_*` actually does and [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) for how to actually fix the problem. – deceze Feb 21 '13 at 11:31
4

You should use UTF-8 encoding for storing the data in the database - then everything should work as expected and no htmlentities() will be required.

Make sure all aspect are utf-8 - the database, the tables encoding and collation, and the connection, both on the client and server side. Things might work even if not everything is utf-8, but might fail horribly when you will do backup & restore - that is why I recommend utf-8 across the board.

yhager
  • 1,632
  • 15
  • 16
  • +1 see this SO answer: http://stackoverflow.com/questions/1344692/i-need-help-fixing-broken-utf8-encoding/1348521#1348521 for a check list. – martin clayton Jan 12 '10 at 22:43
  • @martin clayton, thanks for the link. Everything on the checklist is being respected... Furthermore the data is correctly stored in the database as UTF-8 (ie. no weird characters when I query the database directly). Any thoughts on what could cause the problem? (also, accented characters hard-coded in the HTML display properly without using html entities) – David Chouinard Jan 13 '10 at 01:48
  • Per OcuS's suggestion, I used utf8_encode() and everything works OK. Anyways, still intriguing. Thanks for your help guys! – David Chouinard Jan 13 '10 at 01:52
  • note that if you already wrote data to your db, with bad encoding settings, it will be corrupted already, and future modifications of the encoding will not help. It requires a rather involved 'mysqldump' and restore to fix malformatted data. – yhager Jan 17 '10 at 09:04
  • 1
    I've had double-checked my PHP code and MySql settings, in the end the only thing that helped was to specify the encoding when connecting to the database: `new PDO('mysql:host=yourserver;dbname=yourdb;charset=UTF8', 'dbuser', 'password');` – ManuelJE Jan 02 '19 at 15:21
0

You could set the Collation of the database fields containing the accented character to utf8_general_ci to support them.

Eventually you can set the collation of the database as well, so all fields are set by default.

Veger
  • 37,240
  • 11
  • 105
  • 116