1

Some rows in my database contain an apostrophe of sorts, that, when displayed with PHP, are converted to diamonds with a question mark in the center. Example, if it copies correctly: Captain Jim O’Brien

These "apostrophes" were inserted most likely via TinyMCE, where the user was copying and pasting from Word, or something from a Mac computer perhaps.

How can I display these "apostrophes"? When I view the row in PHPMyAdmin, the apostrophes are displayed (no diamond), so there is obviously a way.

My character encoding is set to UTF-8, and I've tried htmlspecialchars($string) and htmlentities($string), with no luck.

Luke Shaheen
  • 4,262
  • 12
  • 52
  • 82

1 Answers1

2

Characters are encoded in different places.

MySQL has a particular character encoding. By default, it is not UTF-8 but rather latin1.

The HTML document you generate using PHP also has a particular character encoding specified. Finally, the actual bytes in the HTML document factually assume a particular character encoding, which if you're not careful can be different than the character encoding you specify for the document.

Verify that your MySQL encoding is set to UTF-8 as a first step. Note that MySQL can have the default character encoding for the database overridden on a per-table or even per-column basis.

You may be interested in this related post to get a deeper understanding of character encoding

Character Encoding and the ’ Issue

Update

Something put the data into the MySQL database in the first place. Perhaps that "something" was not using UTF-8 encoding.

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • MySQL Collation for the table: `utf8_general_ci`. The HTML document has set `` – Luke Shaheen Jul 24 '12 at 16:11
  • Collation is different than character encoding. Check the encoding (though if collation is utf8 based, the encoding is *probably* utf8) – Eric J. Jul 24 '12 at 16:12
  • Also... what's the encoding that you see in PHPMyAdmin when you view source on a page that shows the apostrophe correctly? Is it UTF8 or something else? – Eric J. Jul 24 '12 at 16:15
  • Using `show variables like "character_set_database";`, I get latin1. So, I realize that I probably need to change the encoding of the database to UTF-8. That being said, somehow PHPMyAdmin is displaying it without changing the database. The source code shows that they are setting the `Content-Type` to UTF8 – Luke Shaheen Jul 24 '12 at 16:18
  • PHPMyAdmin appears to be displaying the apostrophe as `’`, but I'm not sure how they converted it to that? – Luke Shaheen Jul 24 '12 at 16:20
  • PHPMyAdmin may be using a latin1 compatible encoding that happens to match the encoding in MySQL. – Eric J. Jul 24 '12 at 16:24
  • Changing my `Content-Type` meta tag to `` fixes the issue for that page. I'll keep trying other solutions. If I change the encoding of my DB from latin1 to UTF8, what ramifications can that have to other data? And does that automatically fix the issue? – Luke Shaheen Jul 24 '12 at 16:26
  • Changed the encoding of the table only to UTF-8, that does not automatically fix the issue. Unless something is overriding the table encoding. – Luke Shaheen Jul 24 '12 at 16:32
  • You will probably need to re-load the data (and ensure that the original data is UTF-8 encoded). – Eric J. Jul 24 '12 at 18:22
  • The trick was to ensure that the page that housed the form (entry) was UTF-8, database was UTF-8, and the output page was UTF-8. This doesn't solve the problem characters already in the database, but it prevents new ones from being entered. Thanks! – Luke Shaheen Jul 24 '12 at 20:11