1

Table:

CREATE TABLE `test` (
  `f` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Just after connection I send following SQL:

SET NAMES utf8;

Then SQL:

INSERT INTO `test` (`f`) VALUES ("сс-е в терновнике");

As a result in the table I have 1 record with value "с" just up to sign. Is there a chance to correctly handle all such symbols? All this done from PHP if this matters.

Index
  • 676
  • 1
  • 10
  • 27
  • what does a `select length(f) from test` say? If it's `1`, then your text really is truncated. If it's bigger, then your string is really in there, and you're doing something wrong upon retrieving it. – Marc B Apr 17 '14 at 16:27
  • select f, length(f) from test gives length(f) = 2. 2 seems to be equal to 1 char 1 utf8 is multibyte – Index Apr 17 '14 at 16:44

1 Answers1

3

That character is Unicode Character 'SMILING FACE WITH SMILING EYES' (U+1F60A).

U+1F60A is in plane 1, the Supplementary Multilingual Plane, but MySQL's utf8 supports only plane 0, the Basic Multilingual Plane. So you'll have to use utf8mb4.

That symbol is also not widely supported in fonts, so you may have a hard time displaying it.


Re your comment, here's a demonstration of filtering out characters beyond the basic plane.

<?php

$str = "сс-е в терновнике";
$str = preg_replace('/[^\x{0000}-\x{ffff}]/u', '', $str);
echo $str . "\n";

But if it wasn't clear from my answer above, MySQL does support the supplemental plane in utf8mb4.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Is there a nice way to remove all chars that mysql does not support using PHP. Or perhaps you can give me a list of such chars? – Index Apr 17 '14 at 16:48
  • 1
    [this question](http://stackoverflow.com/questions/16496554/can-php-detect-4-byte-encoded-utf8-chars) might be relevant. – Vatev Apr 17 '14 at 17:33
  • @Vatev,thanks, both answers on that question are interesting. – Bill Karwin Apr 17 '14 at 17:47