7

I have a table that includes special characters such as ™.

This character can be entered and viewed using phpMyAdmin and other software, but when I use a SELECT statement in PHP to output to a browser, I get the diamond with question mark in it.

The table type is MyISAM. The encoding is UTF-8 Unicode. The collation is utf8_unicode_ci.

The first line of the html head is

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I tried using the htmlentities() function on the string before outputting it. No luck.

I also tried adding this to php before any output (no difference):

header('Content-type: text/html; charset=utf-8');

Lastly I tried adding this right below the initial mysql connection (this resulted in additional odd characters being displayed):

$db_charset = mysql_set_charset('utf8',$db);

What have I missed?

Jason Wood
  • 351
  • 1
  • 5
  • 13
  • 2
    Unrelated to the question itself, but please use `mysqli` or PDO rather than `mysql` extension, which is deprecated. –  Apr 09 '13 at 03:10
  • Are you sure that whatever is in your database is actually utf8? – Ja͢ck Apr 09 '13 at 03:12
  • [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) – deceze Apr 09 '13 at 03:21
  • how would I be sure that "whatever is in your database is actually utf8"? I'm typing the ™ character directly into phpMyAdmin, and everywhere I look in phpMyAdmin I see utf8 for both the field and the table... – Jason Wood Apr 09 '13 at 03:39

3 Answers3

7

Below code works for me.

$sql = "SELECT * FROM chartest";
mysql_set_charset("UTF8");
$rs = mysql_query($sql);
header('Content-type: text/html; charset=utf-8');
while ($row = mysql_fetch_array($rs)) {
    echo $row['name'];
}
Yong
  • 738
  • 6
  • 19
  • arg! "mysql_set_charset("UTF8");" DID fix the problem. Just not while also using htmlentities(). I didn't realize that htmlentities() ALSO requires a charset to be specified, as discussed here: http://stackoverflow.com/questions/9103801/htmlentities-converts-trademark-into-acirccent – Jason Wood Apr 09 '13 at 05:50
0

There are a couple things that might help. First, even though you're setting the charset to UTF-8 in the header, that might not be enough. I've seen the browser ignore that before. Try forcing it by adding this in the head of your html:

<meta charset='utf-8'>

Next, as mentioned here, try doing this:

mysql_query ("set character_set_client='utf8'");
mysql_query ("set character_set_results='utf8'");
mysql_query ("set collation_connection='utf8_general_ci'");

EDIT

So I've just done some reading up an playing around a bit. First let me tell you, despite what I mentioned in the comments, utf8_encode() and utf8_decode() will not help you here. It helps to actually understand UTF-8 encoding. I found the Wikipedia page on UTF-8 very helpful. Assuming the value you are getting back from the database is in fact already UTF-8 encoded and you simply dump it out right after getting it then it should be fine.

If you are doing anything with the database result (manipulating the string in any way especially) and you don't use the unicode aware functions from the PHP mbstring library then it will probably mess it up since the standard PHP string functions are not unicode aware.

Once you understand how UTF-8 encoding works you can do something cool like this:

$test = "™";
for($i = 0; $i < strlen($test); $i++) { 
    echo sprintf("%b ", ord($test[$i]));
}

Which dumps out something like this:

11100010 10000100 10100010

That's a properly encoded UTF-8 '™' character. If you don't have a character like that in your data retrieved from the database then something is messed up.

To check, try searching for a special character that you know is in the result using mb_strpos():

var_dump(mb_strpos($db_result, '™'));

If that returns anything other than false then the data from the database is fine, otherwise we can at least establish that it's a problem between PHP and the database.

Community
  • 1
  • 1
Justin Warkentin
  • 9,856
  • 4
  • 35
  • 35
  • There was no change after adding . After Adding the other stuff, the problem seemed to get worse. Instead of "�" for ™, I got "â�¢". – Jason Wood Apr 09 '13 at 03:51
  • Just to make sure the character encoding on the page is set right, if you're using firefox you ran right click on the page and go to 'View Page Info' where it shows the encoding. Does it show 'UTF-8' or something like 'ISO-8859-1'? – Justin Warkentin Apr 09 '13 at 03:56
  • I'm no expert with character encodings, but I've gotten it working before. I don't know if it'll help but you should probably check out some of the unicode related PHP functions like [utf8_decode](http://php.net/manual/en/function.utf8-decode.php) and the [mbstring](http://php.net/manual/en/book.mbstring.php) functions. – Justin Warkentin Apr 09 '13 at 04:00
  • Yes, Firefox confirms it's UTF-8. I'll have a look at those functions. – Jason Wood Apr 09 '13 at 04:13
  • I just added more to my answer after doing a little bit of research. Let me know if anything helps. – Justin Warkentin Apr 09 '13 at 06:14
  • Just realized you found your answer – Justin Warkentin Apr 09 '13 at 06:19
-2

you need to execute the following query first.

mysql_query("SET NAMES utf8");   
Saeed
  • 123
  • 1
  • 1
  • 7
  • Please don't use this, it can create SQL injection problems under certain circumstances. Use the "official" `mysql_set_charset` API, which the OP already does. – deceze Apr 09 '13 at 04:06
  • But I think [this question/answer](http://stackoverflow.com/a/7073506) says exactly the opposite? – Markus Köhler Feb 16 '16 at 09:29