You could have problems either on input or on output.
On Input
First of all, make sure that the string you receive actually is utf-8.
- If you received the data from a form, add
accept-encoding="utf-8"
to the form
element.
- Verify that your input string could be a valid utf-8 stream with
TRUE===mb_check_encoding($string, 'UTF-8')
- Check that the byte sequence is as expected by counting characters:
mb_strlen($string, 'UTF-8')
should return the same number of characters as what you see, and a number less than strlen($string)
(which counts bytes).
In your application/config/database.php
, ensure you have these two settings for your database connection:
$db[$dbgroupname]['char_set'] = "utf8";
$db[$dbgroupname]['dbcollat'] = "utf8_unicode_ci";
Replace $dbgroupname
with the group name of your connection (e.g., 'default'
).
Don't use htmlspecialchars
or htmlentities
on data before you store it. Use those on output, in your views.
On Output
Ensure that whatever you are using to view the data that comes out of your database expects a utf-8 encoding.
For html, make sure your Content-Type
header includes charset=utf8
and your html document's head
looks like this:
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Check your results in multiple browsers. Some browsers do charset sniffing and might choose a different charset than what you declare. If so, this means that something on your page is not valid UTF-8--find that thing and eliminate.
If you are using some kind of database viewer (PHPMyAdmin, Navicat, etc), make sure the connection is configured to expect utf-8.