2

Content with special characters e.g. ' " - when stored are replace into characters like these . But not all instances of the characters are changed to characters like these . So it is a little odd why it only affects some characters and not all.

After reading some articles online and in SO I found out about table collation and charset

  • I am using PHP MySQL
  • I use prepared statements when storing and getting values
  • Database Tables collation are set to utf8_unicode_ci
  • My pages are html5 with <meta charset="utf-8">

With the above settings I still get the black diamonds. Any help? I am a little desperate.

EDIT:

Maybe some of you will go through the exact same problem in the future. You might find my answer below as the exact same solution. Cheers!

Jo E.
  • 7,822
  • 14
  • 58
  • 94
  • Is the `character set` and `names` set to utf8 in the database too? – Waygood May 21 '13 at 09:26
  • @Waygood Ahh I have not checked that. my hosting has its own ui when creating a database and it did not make me choose `char set` during db creation. I'll go change it now and see if it fixes the issue. – Jo E. May 21 '13 at 09:30
  • Does the data look correct in the database? – andyb May 21 '13 at 09:30
  • then try running the following queries after you connect `SET CHARACTER SET utf8;` `SET NAMES utf8;` and also try the meta tag `` – Waygood May 21 '13 at 09:31
  • @andyb no it is not correct. when I check it using phpmyadmin the data stored also has the black diamonds. – Jo E. May 21 '13 at 09:32
  • @Waygood I tried changing the db's collation using the operations tab on phpmyadmin but it didn't work. I'll try the `SET CHARACTER SET utf8` and `SET NAMES utf8` you mentioned. BTW about the meta tag, according to this link http://stackoverflow.com/questions/4696499/meta-charset-utf-8-vs-meta-http-equiv-content-type your sample and `` are the same on html5. – Jo E. May 21 '13 at 09:37

2 Answers2

2

My pages are html5 with <meta charset="utf-8">

Whatever metas are quite useless.
A page charset is determined by Content-type HTTP header only, which have to contain proper character set.

$mysqli->set_charset('utf8')/;charset=utf8 in PDO's DSN also ought to be used, but it seems not the issue in your case.

Your Common Sense
  • 156,878
  • 40
  • 214
  • 345
  • I tried adding `set_charset("utf8")` on my code like the one on the manual. Still didn't work. When you say `Content-type` on HTTP header. Do you mean adding `AddDefaultCharset UTF-8` on .htaccess? – Jo E. May 21 '13 at 09:53
  • .htaccess is the way to go too. however it can be overwritten by PHP. Anyway, you have to ask not how to set but how to see which one is actually set. – Your Common Sense May 21 '13 at 09:56
  • ok so I found the problem. using `mb_detect_encoding($str)` it was set to `ASCII`. So thanks for the tip. 1 last question though. How do I permanently change that setting to `UTF-8`? Will `$str = mb_convert_encoding($str, "UTF-8");` permanently change the character encoding or is it just for the current page? – Jo E. May 21 '13 at 10:03
  • what is the source of $str? – Your Common Sense May 21 '13 at 10:06
  • of course not. it detects the charset of $str only. to see an HTTP header you have to see HTTP headers sent by server. or at least see what page encoding is set by browser – Your Common Sense May 21 '13 at 10:21
  • The earlier comment was obviously wrong. So I've been digging and digging all I got was `mb_internal_encoding("UTF-8");`. When I `echo mb_internal_encoding();` it results to UTF-8. But again the problem still persists. I checked webpage encoding by the browser (Google Chrome) it says UTF-8. Now I am trying to set charset on php.ini to `default_charset = "utf-8"` will post if it helped. – Jo E. May 21 '13 at 11:12
  • it won't. php.ini will do the same thing as .htaccess and which you already have. there is something wrong with source data... are you sure you do not format your data somehow before inserting it? – Your Common Sense May 21 '13 at 11:24
  • yes I am positive. I leave it as is. user inputs content. I save it. No altering of data format type etc. – Jo E. May 21 '13 at 11:26
1

OK this post is a couple of weeks old but I found out where I went wrong on this.

The values stored and taken from the database where correctly set as UTF-8. The Problem was when I echo with htmlentities.

My old echo passed through: htmlentities($str, ENT_QUOTES); it didn't have 'UTF-8'

New echo passed through: htmlentities($str, ENT_QUOTES, "UTF-8"); it worked.

Ugh. The damage of a few characters.

Anyway thanks to all of those that helped.

Jo E.
  • 7,822
  • 14
  • 58
  • 94