3

I've spent hours to debug an algo and noticed it was coming from Doctrine (v2.3.3). I'm using libpuzzle to calculate a hash of an image and store this hash in the database. The hash returned has special character in it and apparently Doctrine doesn't like it.

This is the kind of string I have (~550 char):

ÿ�þÿÿþ�þþÿþÿþÿþþ�þÿþÿÿÿ�ÿþþÿþÿ�ÿÿþÿþÿþþÿþþÿÿÿþÿþþþþÿþ�þþþÿ�ÿÿ�ÿÿþ�þÿþÿþÿþÿþþÿþÿÿÿþþÿþþþþÿþþþþ���ÿÿ�ÿ�þþ�þÿÿþþþÿÿþÿþþÿþþþÿþ...

I've investigated and found some people saying to add the charset in the config but I already have it:

# Doctrine Configuration
doctrine:
    dbal:
        driver:   "%database_driver%"
        host:     "%database_host%"
        port:     "%database_port%"
        dbname:   "%database_name%"
        user:     "%database_user%"
        password: "%database_password%"
        charset:  UTF8

If I change the collation of the column from utf8_unicode_ci to utf8_general_ci still not working. Also I've made sure that the default schema collation is utf8_general_ci.

I've tried another string with special characters:

Test !§$%&/()=? äöü ÄÖÜ :D

This string is inserted correctly but the hash still not.

Does anyone know where I could look to fix this issue? Is it a Doctrine bug?

----------------------------------

UPDATE

Just before the execute I still have the correct data binded. So I guess it's a encoding or driver issue. I've modified the symfony configuration:

# Doctrine Configuration
doctrine:
    dbal:
        driver:   "%database_driver%"
        host:     "%database_host%"
        port:     "%database_port%"
        dbname:   "%database_name%"
        user:     "%database_user%"
        password: "%database_password%"
        charset:  UTF8
        options:
            1002: "SET NAMES 'UTF8'"

I've also tried to execute the query without using entities to be sure:

$this->db = $this->getContainer()->get('doctrine')->getConnection();

$img = '/var/www/acme/web/upload/tmp/cd1fa593cf6feb2cde83e68f461a2d947.jpg';
$hash = puzzle_fill_cvec_from_file($img);

$sql = "UPDATE image set hash=? WHERE id=?";
$stmt = $this->db->prepare($sql);

$stmt->execute(array($hash, 180));

Still empty data in the database. In another project I've used Zend_Db and didn't have any trouble saving this hash. I don't know if it's a bug in Doctrine or not :(

----------------------------------

UPDATE 2

I've logged all the queries in mysql and I can see in the log that the content is binded correctly. but not saved properly by mysql.

INSERT INTO image (guid, type, createTime, updateTime, images, imageSize, imageHash, status) 
VALUES (
'c30df23d6b0b08aff079287e00f21ec8a', 
'image', 
'2013-04-22 03:30:33', 
'2013-04-22 03:30:34',
'path/image.jpg', 
'165458', 
'?\0????\0?????????\0??????\0??????\0??????????????????????\0????\0??\0???\0???????????????????????????\0\0\0??\0?\0??\0??????????????????\0\0\0??\0?\0???????????????\0?????????????????\0?\0????????????\0?\0??????????????????????????\0?\0???????????\0???\0?????????\0??????\0\0?????????????????????\0\0??????\0???????????????\0', 
1)

This is my create table:

delimiter $$
CREATE TABLE `image` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `guid` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `createTime` datetime NOT NULL,
  `updateTime` datetime DEFAULT NULL,
  `images` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `imageSize` bigint(20) DEFAULT NULL,
  `imageHash` longtext COLLATE utf8_unicode_ci,
  `status` integer(10) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$

Cheers, Maxime

maxwell2022
  • 2,818
  • 5
  • 41
  • 60

2 Answers2

2

You should not use imageHash longtext COLLATE utf8_unicode_ci, because the RDBM will try to map the data to a charset, which obviously does not match if your hash data is libpuzzle binary output.

Try to ALTER your schema to make your imageHash column a BLOB.

As stated here: http://dev.mysql.com/doc/refman/5.0/en/blob.html

BLOB values are treated as binary strings (byte strings). They have no character set, and sorting and comparison are based on the numeric values of the bytes in column values.

TEXT values are treated as nonbinary strings (character strings). They have a character set, and values are sorted and compared based on the collation of the character set.

Florian Klein
  • 8,692
  • 1
  • 32
  • 42
  • What if I have to store a different string in this column? For example for now I'm using libpuzzle to get the image signature but if I change algo I might have a regular string or just an integer, etc... Would it be a problem or a performance issue (using blob)? – maxwell2022 Apr 29 '13 at 00:13
  • `TEXT`, or `LONGTEXT` in my case is considered as `BLOB` and is stored the same way: http://dev.mysql.com/doc/refman/5.0/en/blob.html – maxwell2022 Apr 29 '13 at 13:11
  • Have you read my answer ? I've just updated it to put in **bold** the important parts. – Florian Klein May 02 '13 at 09:14
  • BTW, I did read the link you pointed, I even put it in my answer. That's where I got the above statment. – Florian Klein May 02 '13 at 09:16
  • So, really, you should give BLOB a try. Cause it does **not** behave like TEXT. – Florian Klein May 02 '13 at 09:17
  • Yep, `BLOB` is working well. What I don't understand is that I have another application which is storing the exact same type of data in a `text` column and it's working. Anyway, using `BLOB`, `Doctrine` and the PHP function `stream_get_contents()` it's working fine. – maxwell2022 May 02 '13 at 13:05
  • good to know :) Maybe in the other application, you were so lucky that generated libpuzzle's data didn't contain any byte that looked like wrong in the caracter set! – Florian Klein May 08 '13 at 09:12
0

It could be your MySQL server isn't configured to accept UTF8 from clients by default.

In cases like this, you need to run the following query:

SET NAMES utf8

Run this in the start of your PHP script, for example through the Doctrine Connection object.

Jani Hartikainen
  • 42,745
  • 10
  • 68
  • 86
  • nop, I've updated the MySQL config following this: http://stackoverflow.com/questions/3513773/change-mysql-default-character-set-to-utf8-in-my-cnf – maxwell2022 Apr 21 '13 at 14:35