0

I am trying to insert text strings into a MySQL database. However, I'm running into trouble where the data is silently truncated if I try to insert utf-8 smiley faces like .

I made this super simple script to highlight the problem.

<?php

  require_once("sqlheader.php");
  header("Content-Type: text/plain; charset=utf-8");

  $q1 = "DROP TABLE IF EXISTS `test-error3`";
  $sql->query($q1);

  $q2 = "CREATE TABLE `test-error3` (`id` TINYINT(3) UNSIGNED PRIMARY KEY AUTO_INCREMENT, `test` MEDIUMTEXT, `test2` MEDIUMTEXT)";
  $sql->query($q2);

  $q3 = "INSERT INTO `test-error3` (`test`, `test2`) VALUES (\"Hello World!  Will this show up?\", \"I like to smile!  Will this show up?\")";
  $sql->query($q3);

  $q4 = "SELECT * FROM `test-error3`";
  $r4 = $sql->query($q4);
  $a4 = $r4->fetch_assoc();
  print_r($a4);

  $q5 = "SELECT `test`, `test2`, RIGHT(`test`, 15) AS `test3`, RIGHT(`test2`, 15) AS `test4` FROM `test-error3`";
  $r5 = $sql->query($q5);
  $a5 = $r5->fetch_assoc();
  print_r($a5);

  $q6 = "SHOW FULL COLUMNS FROM `test-error3`";
  $r6 = $sql->query($q6);
  while ($a6 = $r6->fetch_assoc()) {
    print_r($a6);
  }

?>

Here's the output I get from the script:

Array
(
    [id] => 1
    [test] => Hello World! 
    [test2] => I like to smile! 
)
Array
(
    [test] => Hello World! 
    [test2] => I like to smile! 
    [test3] => Hello World! 
    [test4] => like to smile! 
)
Array
(
    [Field] => id
    [Type] => tinyint(3) unsigned
    [Collation] => 
    [Null] => NO
    [Key] => PRI
    [Default] => 
    [Extra] => auto_increment
    [Privileges] => select,insert,update,references
    [Comment] => 
)
Array
(
    [Field] => test
    [Type] => mediumtext
    [Collation] => utf8_unicode_ci
    [Null] => YES
    [Key] => 
    [Default] => 
    [Extra] => 
    [Privileges] => select,insert,update,references
    [Comment] => 
)
Array
(
    [Field] => test2
    [Type] => mediumtext
    [Collation] => utf8_unicode_ci
    [Null] => YES
    [Key] => 
    [Default] => 
    [Extra] => 
    [Privileges] => select,insert,update,references
    [Comment] => 
)

As you can see, any text after the smiley face is gone. This originally happened with input which I had been sanitizing with $sql->real_escape_string. And you can also see that the Collation for the MEDIUMTEXT fields supports UTF-8 encoding.

I originally thought the problem must have been higher up in the script, or in the way I was displaying the output, however I'm to the most basic level at this point and as you can see by the RIGHT function in the MySQL query, the data in the database itself is being truncated during the insert or select operation. Merely adding the UTF-8 smiley face anywhere in a text field in my query appears to be resulting in the early termination of the text field. As a result, I lose any data after a smiley face, silently and without any sort of warning that I can seem to detect.

Does anyone have an idea of what is going on and how I should be cleaning the data in order to enable a reliable insert operation? Why does real_escape_string not automatically correct for this?

azoundria
  • 940
  • 1
  • 8
  • 24
  • Like https://stackoverflow.com/questions/29318784/how-are-smileys-encoded-in-mysql-utf-8-mb4-database . Collation is the order in which they are sorted, character set it is the range of values accepted. So utf8-mb4 is needed for :-) – danblack Feb 18 '20 at 07:36
  • With CHARACTER SET utf8mb4 it works to insert the data. However, I'm incredibly curious why the data is silently truncated as opposed to throwing a warning/error, rejecting those particular characters, or inserting some other default box character. It does not seem like the best way to handle this case. None of the links I could find mentioned data being truncated either. – azoundria Feb 18 '20 at 08:12
  • [pdo mode PDO::ERRMODE_WARNING](https://www.php.net/manual/en/pdo.error-handling.php) – danblack Feb 18 '20 at 08:18
  • I am using MYSQLI. So it seems like there is a report_mode which I can set to MYSQLI_REPORT_ALL. – azoundria Feb 18 '20 at 08:34
  • Please do not use `real_escape_string()` It doesn't help with such issues. You should be using prepared statements anyway. – Dharman Feb 18 '20 at 09:55
  • @azoundria - Truncation is one of the common problems; its cause is discussed here: https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored – Rick James Feb 19 '20 at 21:14
  • @azoundria - and `SHOW CREATE TABLE \`test-error3\`` is another way to get the info you displayed. – Rick James Feb 19 '20 at 21:17

0 Answers0