I am trying to insert text strings into a MySQL database. However, I'm running into trouble where the data is silently truncated if I try to insert utf-8 smiley faces like .
I made this super simple script to highlight the problem.
<?php
require_once("sqlheader.php");
header("Content-Type: text/plain; charset=utf-8");
$q1 = "DROP TABLE IF EXISTS `test-error3`";
$sql->query($q1);
$q2 = "CREATE TABLE `test-error3` (`id` TINYINT(3) UNSIGNED PRIMARY KEY AUTO_INCREMENT, `test` MEDIUMTEXT, `test2` MEDIUMTEXT)";
$sql->query($q2);
$q3 = "INSERT INTO `test-error3` (`test`, `test2`) VALUES (\"Hello World! Will this show up?\", \"I like to smile! Will this show up?\")";
$sql->query($q3);
$q4 = "SELECT * FROM `test-error3`";
$r4 = $sql->query($q4);
$a4 = $r4->fetch_assoc();
print_r($a4);
$q5 = "SELECT `test`, `test2`, RIGHT(`test`, 15) AS `test3`, RIGHT(`test2`, 15) AS `test4` FROM `test-error3`";
$r5 = $sql->query($q5);
$a5 = $r5->fetch_assoc();
print_r($a5);
$q6 = "SHOW FULL COLUMNS FROM `test-error3`";
$r6 = $sql->query($q6);
while ($a6 = $r6->fetch_assoc()) {
print_r($a6);
}
?>
Here's the output I get from the script:
Array ( [id] => 1 [test] => Hello World! [test2] => I like to smile! ) Array ( [test] => Hello World! [test2] => I like to smile! [test3] => Hello World! [test4] => like to smile! ) Array ( [Field] => id [Type] => tinyint(3) unsigned [Collation] => [Null] => NO [Key] => PRI [Default] => [Extra] => auto_increment [Privileges] => select,insert,update,references [Comment] => ) Array ( [Field] => test [Type] => mediumtext [Collation] => utf8_unicode_ci [Null] => YES [Key] => [Default] => [Extra] => [Privileges] => select,insert,update,references [Comment] => ) Array ( [Field] => test2 [Type] => mediumtext [Collation] => utf8_unicode_ci [Null] => YES [Key] => [Default] => [Extra] => [Privileges] => select,insert,update,references [Comment] => )
As you can see, any text after the smiley face is gone. This originally happened with input which I had been sanitizing with $sql->real_escape_string. And you can also see that the Collation for the MEDIUMTEXT fields supports UTF-8 encoding.
I originally thought the problem must have been higher up in the script, or in the way I was displaying the output, however I'm to the most basic level at this point and as you can see by the RIGHT function in the MySQL query, the data in the database itself is being truncated during the insert or select operation. Merely adding the UTF-8 smiley face anywhere in a text field in my query appears to be resulting in the early termination of the text field. As a result, I lose any data after a smiley face, silently and without any sort of warning that I can seem to detect.
Does anyone have an idea of what is going on and how I should be cleaning the data in order to enable a reliable insert operation? Why does real_escape_string not automatically correct for this?