7

I'm really struggling trying to get PHP to play ball with MySQL through PDO with regards to UTF-8 characters.

I get the following using php --version:

PHP 5.3.6 (cli) (built: Apr 19 2011 13:21:12)
Copyright (c) 1997-2011 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2011 Zend Technologies

And this is the welcome message from mysql cli:

Server version: 5.1.57 Source distribution

I have this as my database connection code:

try {
    $dbh = new PDO('mysql:host='.$server.';dbname='.$database.';charset=UTF8', $user, $password, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'"));
} catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}

And my php code to insert into MySQL:

$dbh->query("SET NAMES 'UTF8'"); 
$values = array(':idCode' => $idCode, ':name' => $name, ':address' => $address, ':postcode' => $postcode, ':tel' => $tel, ':website' => $website, ':disabled' => $disabled, ':disabledWC' => $disabledWC, ':description' => $description, ':location' => $location, ':type' => $type, ':status' => $status, ':image1name' => $image1name, ':image2name' => $image2name, ':image3name' => $image3name, ':image4name' => $image4name, ':image5name' => $image5name);
$preparedStatement = $dbh->prepare('UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode');
$preparedStatement->execute($values);

The problem I'm getting is that even though I've verified that $description does indeed contain UTF8 encoded bytes (Using bin2hex and a hex2UTF8 tool on the web) the data does not get submitted through the connection as UTF-8 and I end up with "Générik" instead of "Générik" in the database.

I have tried putting in a call to "SET NAMES UTF8" prior to executing, I've put in:

[client]
default-character-set=utf8

[mysqld]
default-character-set=utf8

in my my.cnf but can't seem to get any joy.

Any hints or tips on how to get this to work?

I should add - if I enter the text manually in the command line MySQL client it works fine, the column is set to take UTF8 encoding.

As requested - Create Table Script:

| venues | CREATE TABLE `venues` (
  `idCode` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(512) DEFAULT NULL,
  `address` varchar(1024) DEFAULT NULL,
  `postcode` varchar(32) DEFAULT NULL,
  `tel` varchar(32) DEFAULT NULL,
  `website` varchar(1024) DEFAULT NULL,
  `disabled` tinyint(4) DEFAULT NULL,
  `disabledWC` tinyint(4) DEFAULT NULL,
  `description` varchar(4096) CHARACTER SET utf8 DEFAULT NULL,
  `status` tinyint(4) DEFAULT NULL,
  `location` varchar(32) DEFAULT NULL,
  `type` varchar(32) DEFAULT NULL,
  `date_modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `image1name` varchar(255) DEFAULT NULL,
  `image2name` varchar(255) DEFAULT NULL,
  `image3name` varchar(255) DEFAULT NULL,
  `image4name` varchar(255) DEFAULT NULL,
  `image5name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`idCode`)
) ENGINE=MyISAM AUTO_INCREMENT=129 DEFAULT CHARSET=latin1

And hte result of SELECT @@character_set_database:

SELECT @@character_set_database
    -> ;
+--------------------------+
| @@character_set_database |
+--------------------------+
| utf8                     |
+--------------------------+
1 row in set (0.00 sec)

And the results of dumpParams:

SQL: [410] UPDATE venues SET name = :name, address = :address, postcode = :postcode, tel = :tel, website = :website, disabled = :disabled, disabledWC = :disabledWC, description = :description, location = :location, type = :type, status = :status, date_modified = NOW(), image1name = :image1name, image2name = :image2name, image3name = :image3name, image4name = :image4name, image5name = :image5name WHERE idCode = :idCode
Params:  17
Key: Name: [7] :idCode
paramno=-1
name=[7] ":idCode"
is_param=1
param_type=2
Key: Name: [5] :name
paramno=-1
name=[5] ":name"
is_param=1
param_type=2
Key: Name: [8] :address
paramno=-1
name=[8] ":address"
is_param=1
param_type=2
Key: Name: [9] :postcode
paramno=-1
name=[9] ":postcode"
is_param=1
param_type=2
Key: Name: [4] :tel
paramno=-1
name=[4] ":tel"
is_param=1
param_type=2
Key: Name: [8] :website
paramno=-1
name=[8] ":website"
is_param=1
param_type=2
Key: Name: [9] :disabled
paramno=-1
name=[9] ":disabled"
is_param=1
param_type=2
Key: Name: [11] :disabledWC
paramno=-1
name=[11] ":disabledWC"
is_param=1
param_type=2
Key: Name: [12] :description
paramno=-1
name=[12] ":description"
is_param=1
param_type=2
Key: Name: [9] :location
paramno=-1
name=[9] ":location"
is_param=1
param_type=2
Key: Name: [5] :type
paramno=-1
name=[5] ":type"
is_param=1
param_type=2
Key: Name: [7] :status
paramno=-1
name=[7] ":status"
is_param=1
param_type=2
Key: Name: [11] :image1name
paramno=-1
name=[11] ":image1name"
is_param=1
param_type=2
Key: Name: [11] :image2name
paramno=-1
name=[11] ":image2name"
is_param=1
param_type=2
Key: Name: [11] :image3name
paramno=-1
name=[11] ":image3name"
is_param=1
param_type=2
Key: Name: [11] :image4name
paramno=-1
name=[11] ":image4name"
is_param=1
param_type=2
Key: Name: [11] :image5name
paramno=-1
name=[11] ":image5name"
is_param=1
param_type=2

Logged hex bytes (and expected string value) of $description:

 Desctription: 526f756e64696e67206f6666205361747572646179206e6967687420696e20756e666f726765747461626c65207374796c652c204672656e636820636f6d70616e792047c3a96ec3a972696b20566170657572206272696e67207468656972206e65772073686f772057617465726c69747a20746f207468652048617420466169722e204174204b696e672047656f726765205620506c6179696e67204669656c64732c2042617220456e642e: Rounding off Saturday night in unforgettable style, French company Générik Vapeur bring their new show Waterlitz to the Hat Fair. At King George V Playing Fields, Bar End.
Matt Fellows
  • 6,512
  • 4
  • 35
  • 57

3 Answers3

7
  try {
      $dbh = new PDO(
          'mysql:host='.$server.';dbname='.$database, 
          $user, 
          $password,
          array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'UTF8'")
      );
      $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
  } catch (PDOException $e) {
      print "Error!: " . $e->getMessage() . "<br/>";
      die();
  }

Don't encode data before puting them to the db, just pure raw utf-8 data send. Should be enough.

Regards.

cojack
  • 2,444
  • 1
  • 17
  • 18
  • No dice. I'm not doing anything to encode the data - just getting from the $_REQUEST object (Checking it is already UTF8 encoded as described by logging the bytes) and submitting to MySQL. :( – Matt Fellows May 28 '12 at 11:51
  • Genius, your table is latin1 and you write data to them as UTF-8? how should it works? Change set names to latin1, also your site should be in iso-1 or change db to utf-8 – cojack May 28 '12 at 13:56
  • First of all - that tone is completely inappropriate. Second of all, just because something is obvious to you, doesn't mean it's obvious to everyone else, that why this site exists. And by your comment I assume you are stating that it's impossible to have different character encoding in different columns? Not something that is obvious or indeed a necessary restriction. – Matt Fellows May 28 '12 at 13:59
  • http://dev.mysql.com/doc/refman/5.0/en/charset-column.html please read it and point at "If CHARACTER SET X is specified without COLLATE, character set X and its default collation are used." Regards. – cojack May 28 '12 at 14:03
  • So what you're saying is - there was nothing wrong with my database? I had a default character set of UTF8 on that column - so it would have use default UTF8 collation. What's more I've now tried changing the default character set for the entire table and it still doesn't work. – Matt Fellows May 28 '12 at 14:08
2

For the word Générik, your "logged hex bytes" are 0x47c3a96ec3a972696b. This is indeed UTF-8 encoded. The client with which you are attempting to verify your stored data is almost certainly setting the wrong character set prior to fetching the table contents.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Nope - I'm checking in the MySQL command line client. If I use that client to set the value, then do the select again, it is all fine. What's more If I use the client to set the value, then look in my web interface, it is also fine, but using the PHP to set the value, yields broken text in both. – Matt Fellows May 28 '12 at 13:02
  • @MattFellows: Sorry, I thought you meant those logged hex bytes are what is stored in the table... what do you get for `SELECT HEX(description) FROM venues WHERE ...` ? – eggyal May 28 '12 at 13:03
  • Hmmm - I take it back - more investigation needed. HEX(description) yields: 526F756E64696E67206F6666205361747572646179206E6967687420696E20756E666F726765747461626C65207374796C652C204672656E636820636F6D70616E792047C3A96EC3A972696B20566170657572206272696E67207468656972206E65772073686F772057617465726C69747A20746F207468652048617420466169722E204174204B696E672047656F726765205620506C6179696E67204669656C64732C2042617220456E642E WHich looks correctly encoded to me. I'll compare HEX values for each input method - first ray of light in an hour... THanks. – Matt Fellows May 28 '12 at 13:06
  • @MattFellows: As I said, your `mysql` client is almost certainly setting an incorrect connection character set (so the data that *it* stores is incorrectly encoded, but when retrieved using the same character set -- such as when using the same client -- the error is undone and goes unnoticed). – eggyal May 28 '12 at 13:08
  • 2
    Thanks. The problem actually lay in the fact that I was using PuTTY to enter the values into the MySQL command line client. I was essentially being lied to about what was valid utf8. Switched to my Mac terminal and the problem was much simpler to solve. So whilst technically you didn't give me the answer you certainly led me to teh correct one. – Matt Fellows May 28 '12 at 14:35
2

I had the same problem, and I figured it out with help from this post. I was feeding perfectly valid umlauts into the DB, but it was storing junk.

I basically just added this: array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'") after my connect/init password, thusly:

$this->db = new PDO($this->dsn, $this->username, $this->password, 
                    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));
Community
  • 1
  • 1
Chris Marshall
  • 4,910
  • 8
  • 47
  • 72
  • this answer irrelevant to the question in particular and wrong in general. charset have to be set in DSN – Your Common Sense Apr 09 '13 at 19:30
  • What the heck are you talking about? It works! Wanna see (https://github.com/MAGSHARE/BMLT-Root-Server/blob/master/main_server/server/classes/c_comdef_dbsingleton.class.php#L163") the code? I'll accept "off topic," but I am genuinely interested in knowing EXACTLy why it is wrong "in general." I would appreciate it, because I don't like to write wrong code, and I'd like to do it correctly, which is exactly why I come here. – Chris Marshall Apr 10 '13 at 17:18
  • Umm...just checked it out. http://php.net/manual/en/ref.pdo-mysql.connection.php (Look at the third comment). http://stackoverflow.com/users/285587/your-common-sense back atcha. That was not helpful. – Chris Marshall Apr 10 '13 at 17:25
  • 1
    Oh, yeah. Thanks for removing the Terry Pratchett reference. Look, let's be nice, here. We're both uber-experienced geeks with a lot to offer, and I have absolutely no interest whatsoever in competing with you. You are Mister Big Shot with 37K and I'm Johnny-Come-Lately with a measly 145. All I want to do is be a decent citizen of this marvelous resource, and give back. My answer was completely correct. My code is meant for installations back to 5.1, and the code that I posted is exactly what php.net says to do. Your "common sense" crack was not nice. This ain't UseNet anymore. Okay? – Chris Marshall Apr 10 '13 at 17:48