1

I am trying to convert a string from HTML-ENTITIES to UTF-8 and then save the encoded string in my database. The html entities are greek letters and look for example like this: νω

Now I tried thousands of different ways, starting from just using utf8_encode or html_entity_decode until now I came across the function mb_convert_encoding(). Now the really weird thing is that when converting my string and then outputting it, it is correctly encoded to utf-8, but when inserting this string into my database I end up getting something like: ξÏνω.

This is the code for the encoding:

header('Content-Type: text/html; charset=utf-8');
mb_internal_encoding('utf-8');
......
while($arr = $select->fetch_array(MYSQLI_ASSOC))
{   
$text = $arr["greek"];
$result = mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES');  
$mysqli->query("UPDATE some SET greek = '".$result."'");    
}

When outputting my query and then manually doing a sql query in phpmyadmin it works fine, so it doesnt seem to be a problem of my db. There must be some problem when transferring the encoded string to my database...

Chris
  • 6,093
  • 11
  • 42
  • 55
  • what charset is your table in your database? the default is latin1, you need to change that to utf-8. that's the first thing you should check. – Landon Oct 22 '12 at 19:03
  • The database is set to utf-8. – Chris Oct 22 '12 at 19:05
  • can you echo your query right before the query() function is called? Does it look right there? – Landon Oct 22 '12 at 19:06
  • I echoed the `UPDATE some SET greek = '".$result."'`, it is correctly encoded to utf-8 (also in the source code of my browser). This is what I get and its correct: `UPDATE some SET greek = 'ξύνω'` – Chris Oct 22 '12 at 19:11

2 Answers2

5

As you see in your script, you are instructing the browser to use UTF8. That is the first step.

However your database needs the same thing and also the encoding/collation on the tables need to be UTF8 too.

You can either recreate your tables using utf8_general_ci or utf8_unicode_ci as the collation, or convert the existing tables (see here)

You need to also make sure that your database connection i.e. php code to mysql is using UTF8. If you are using PDO there are plenty of articles that show how to do that. The simplest way is to do:

$mysqli->query('SET NAMES utf8');

NOTE The change you will make now is final. If you change the connection encoding to your database, you could affect existing data.

EDIT You can do the following to set the connection

$mysqli = new mysqli($host, $user, $pass, $db);

if (!$mysqli->set_charset("utf8")) {
   die("Error loading character set utf8: %s\n", $mysqli->error);
}

$mysqli->close();

Links of interest:

Whether to use "SET NAMES"

Community
  • 1
  • 1
Nikolaos Dimopoulos
  • 11,495
  • 6
  • 39
  • 67
  • Do I have to set the whole database to utf8? Is it not enough to set the character set of the table I am using to utf8? Because when outputting `$mysqli->character_set_name()` it gives me latin1, so the whole db is not in utf8 but the table is. I am confused... – Chris Oct 22 '12 at 19:18
  • You have two issues here. One is your tables i.e. where the data is stored and one is the connection between PHP and MySQL. By setting the encoding on your table you solve problem 1. By setting the connection string to UTF8 you solve #2. The `$mysqli->character_set_name()` returns the default character set for your connection (PHP to MySQL). This is now your problem. It should return UTF8. – Nikolaos Dimopoulos Oct 22 '12 at 19:57
  • Have a look at my reply - I made an edit that might help you with the connection – Nikolaos Dimopoulos Oct 22 '12 at 19:59
1

Execute the SET NAMES 'utf8' query prior to any others.

Ondřej Bouda
  • 1,061
  • 11
  • 16