5

$book is a 7kb string. If this query is executed using PHP PDO exec, the monograph column (LONGTEXT) data gets truncated at 6765 character:

echo strlen($book); // output 7157

$db->exec("UPDATE `chemicals` SET `monograph` = {$db->quote($book)} WHERE `id` = {$db->quote($c['id'])};");

However, if I print the query and execute it using SQL client (bypassing PHP), it inserts all the data to the database. Which makes me think it is PHP setting that I am not yet familiar to.

Note that the same is happening if I use prepared statements (incl. with PDO::PARAM_LOB).

$book value dumped before exec https://gist.github.com/79d5fe1050bbb0e2fac8 (7157). The actual data that ends up at the database https://gist.github.com/df49d4a9707660b8b60b (6765). I don't understand how such data truncation is technically possible since the whole query is passed to MySQL (otherwise SQL syntax error would pop).

echo "UPDATE `chemicals` SET `monograph` = {$db->quote($book)} WHERE `id` = {$db->quote($c['id'])};";

If I execute the output (https://gist.github.com/a05fe4c033e74897b82b) using SQL client, this is the data that ends up in the database https://gist.github.com/88870fe26a3ae40e991e (7157, expected).

PDO is initiated using UTF8 connection.

new PDO('mysql:dbname=[..];host=localhost;charset=utf8', 'root', '[..]', array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';"));

UPDATE 2012 07 25 04:11 EST

Now I know that's encoding issue.

$db->exec("UPDATE `chemicals` SET `monograph` = {$db->quote(utf8_decode($book))} WHERE `id` = {$db->quote($c['id'])};");

However, I am not quiet sure what to do about it. My connection to MySQL is unicode already.

/etc/my.cnf is configured to use the following settings server-wide.

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Gajus
  • 69,002
  • 70
  • 275
  • 438
  • The data is being truncated on a strange (non-ASCII) character. Have you set the connection character set appropriately? – eggyal Jul 25 '12 at 08:37
  • @eggyal The connection is utf8 with collation utf8_general_ci server-wide. – Gajus Jul 25 '12 at 08:56
  • To confirm, how have you set the connection character set? Is the string definitely encoded in UTF-8? What is the byte sequence at the point the data is truncated? – eggyal Jul 25 '12 at 08:58

2 Answers2

1

There are two points to be made here. One is that ideally all character encodings must be UTF8 - that's server, client, connection, and table. Two is that PHP's strlen function counts bytes, not characters.

Your table character set may not be set to UTF8. You can do

SHOW CREATE TABLE chemicals;

to check that. You should also add these to your my.cnf:

[mysqld]
character-set-client=utf8
character-set-results=utf8

Read more about MySQL character sets here:

MySQL character sets

Gary G
  • 5,692
  • 2
  • 27
  • 18
  • Everything related to MySQL is using unicode. https://gist.github.com/98f79ad868a1d5fb77b2 Input data, however, isn't unicode. – Gajus Jul 25 '12 at 11:38
  • The table and column encoding is `utf8` as well? If your input data isn't `utf8`, then MySQL is doing character conversion on it, and you shouldn't expect the lengths to match. Looking at your input data, there's at least one non-ASCII character in there (looks like `U+FFFD`, unicode replacement character.) – Gary G Jul 25 '12 at 11:48
0

It turned out to be that it is encoding issue. The are two solutions. The most obvious is to fix the encoding to match the database/connection settings. In my case, I was getting a iso-8859-1 string and interpreting it as a unicode.

However, it shouldn't be an issue anyway. I went further to discover that PDO::ATTR_EMULATE_PREPARES is set to TRUE by default.

Community
  • 1
  • 1
Gajus
  • 69,002
  • 70
  • 275
  • 438