2

Ok, this is strange and I don't really know what causes my problem:

Workflow:

  • I've a regular mysql database with an utf-8 column.
  • I'm inserting rows, via a simple input field, text with german umlauts into this column.
  • I'm reading and displaying the rows with a simple query.

My Problem:

Sometimes, and only sometimes, instead of the umlauts question marks are being displayed. What's weird is that it's only with certain words, not all of the umlauts. For instance: "Gummibären" results in a question mark for the "ä" but "Gumibären" (note the single "m") is being displayed correctly. So, I can't really figure out a pattern here.

  • The column is in utf8_general_ci
  • The HTML-Files uses the <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  • The PHP-File itself is encoded in utf-8

The query is:

mysqli_query("SET NAMES 'utf8'"); 
$var = "SELECT * FROM table ORDER BY id DESC";

What's also strange is, that the "ä" is not replaced by 1 question mark but 2 question marks, as if there were 2 characters not encoded instead of just 1.

Is there something I'm missing?

Tobias
  • 319
  • 3
  • 16
  • Warning - mysql_query This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQL extension should be used. – light Jul 03 '14 at 21:02
  • possible duplicate of [UTF-8 encoded html pages show � (questions marks) instead of characters](http://stackoverflow.com/questions/5445137/utf-8-encoded-html-pages-show-%ef%bf%bd-questions-marks-instead-of-characters) – moon prism power Jul 03 '14 at 21:25
  • From the above linked QA, "If the data is being fetched from a database, you could use mb_detect_encoding() to verify its encoding." – moon prism power Jul 03 '14 at 21:28
  • Thanks, I read this post and did some researching. Unfortunately it doesn't help me a lot with my problem. Weirdly every other occurrence of a certain umlaut is displayed right and not as a �. If I'm setting mysqli_set_charset($con,"utf8"); it's displaying 2 �� if not, it's displaying a single �. – Tobias Jul 03 '14 at 21:38
  • mb_detect_encoding shows ASCII – Tobias Jul 03 '14 at 21:44
  • After some testing, it seems even weirder because it only happens if one of the umlauts or other extended latin characters is at place 7 of a string. For example 123456ä89 results in the ä being a question mark whereas 12345ä789 (6th place) displays normal. Sorry for language but: WTF. – Tobias Jul 03 '14 at 23:21

1 Answers1

1

I figured it out. It had not really anything to do with reading or writing utf into the database but with the PHP function wordwrap(). If wordwrap() is used with multi-byte unicode characters such as "ä" and similar, it will mess up strings. I figured it out with the help of this: Multi-byte safe wordwrap() function for UTF-8 and this: php wordwrap cut parameter when dealing with weird characters – for future reference.

Thanks though for your input!

Community
  • 1
  • 1
Tobias
  • 319
  • 3
  • 16