1

I am encountering an issue with em dashes that I haven't been able to resolve. For our site, we store text content in a 'content' column in a MySQL database that will occasionally contain special characters, all of which appear correctly except for the em dash. Any em dashes in the entered text turn into question mark diamond replacement characters when printed out on the site.

The bug only appears on our production site (on our development site and staging site, the emdashes render correctly), which leads me to believe it may be an issue with character sets on their respective databases (our development/staging databases are hosted on one server, and production on a different server). However, the column containing the text content in question is set utf-8 on all the databases. Wrapping the output in htmlspecialchars() didn't work.

Of note is the fact that when I used print_r() to print the variable into which I was saving the fetched DB results (for debugging), the em dashes would then render correctly when printed out below.

sh2206
  • 21
  • 3
  • Check the webpage encoding (from meta tag), the php script file encoding, and mysql driver encoding to match. – Cyrbil Jul 27 '15 at 15:30
  • Have you tried — instead of — ? – Kevin_Kinsey Jul 27 '15 at 15:35
  • we had the same issue with dashes, it turns out that users were copy/pasting from word which has a special dash (slightly longer) and mysql encoding on production was not accepting it. We solved the problem by creating an escape function with some exceptions – Claudio Pinto Jul 27 '15 at 15:36
  • possible duplicate of [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) – chris85 Jul 27 '15 at 15:43
  • Hi @ClaudioPinto we might wind up doing something similar, since our problem is also likely created by users copy/pasting from Word. Could you offer more details on the function you made? Did it search and replace for the Word dash? Thanks to you and everyone for quick responses. – sh2206 Jul 27 '15 at 15:56
  • we created our own sanitiser but there are loads of solutions already made. have a look at this post http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string it's a similar problem – Claudio Pinto Jul 27 '15 at 16:06

1 Answers1

0

Thanks to everyone who responded, it was fixed by the SET NAMES 'utf8' solution in UTF-8 all the way through

Community
  • 1
  • 1
sh2206
  • 21
  • 3