0

I am struggling with the special character encoding on my website. I have an input field with a connected datalist. The datalist is populated from my database when the page is loaded. But somehow I get those question mark inside diamond symbols for each special character in the datalist options and normal question marks for special language characters (e.g. Chinese). Here is what I tried:

  1. The initial dataset is a csv file, which I included in my database. I checked the collation in Notepad++ and it is set to UTF8 without BOM.
  2. I am using phpmyadmin for database management. The server connection collation is set to utf8mb4_bin. The table where I store the data has the collation utf8_bin. And when I imported the data I made sure to use utf8. The characters are also displayed correctly in phpmyadmin.
  3. In the <head> section of my index.php file I have also the following <meta http-equiv="Content-Type" content="text/html;charset=utf-8">

But I still get those question marks. What else I could check??

2 Answers2

0

You can easily check if the string you retrieve from DB is valid utf8 using this snippet. If it is valid UTF-8 then your problem is in the presentation layer.

if(mb_detect_encoding($str, 'UTF-8', true)){ die('I\'m a valid UTF-8 string, YAY :D '); }

  • It's very hard to correctly detect character encodings, because of the massive overlap, and gaps that exist in the ranges of various encodings. While your answer is a good suggestion it is not a completely reliable result producer. – Martin May 23 '16 at 10:17
  • I completely agree, but since the question was quite vague this felt like a decent starting point. – Marco Albarelli May 23 '16 at 13:05
0

I am using phpmyadmin for database management. The server connection collation is set to utf8mb4_bin. The table where I store the data has the collation utf8_bin. And when I imported the data I made sure to use utf8. The characters are also displayed correctly in phpmyadmin.

This is a likely cause, you want to change this to utf8mb4_ Collation.

Other points are to use PHP mb_ actions such as

mb_internal_encoding('UTF-8');
mb_http_output('UTF-8'); 

At the top of your PHP pages. Also be sure to set the connection character set in your Database connection object:

$this->MySQLiObjectVariable->set_charset("utf8mb4"); 

set_charset Manual Page

What often happens is that while MySQL storage is set correctly and the PHP output is set correctly, the communication between the two is set as a system default which is often not the best character set for the given data (typically latin1_ or utf8_).

Also it may be generally benefitial to ignore the Meta character set declaration and actually set them with a direct HTML header such as:

<?php
header('Content-Type: text/html; charset=utf-8');

appearing on your output page before the HTML content, ensuring the browser recieves the data as UTF-8.

Also read UTF-8 all the way through.

Community
  • 1
  • 1
Martin
  • 22,212
  • 11
  • 70
  • 132
  • Thanks for the long answer. I tried to change everything to _utf8mb4_ and I tried to include the PHP mb_ actions, but it still didn't work out. The header information of the site I store in an extra header.php file which I include in each site. – TotoSchillaci May 23 '16 at 10:52