1

I'm trying to insert a surrogate pair ('', \uD852\uDF62, the same as U+24B62 from this example) into MySQL.

An INSERT with an unescaped literal, suggested by this answer:

INSERT INTO unicode_test (value) VALUES ('');
-- or
INSERT INTO unicode_test (value) VALUES (_utf8'');

fails with

Error Code: 1366. Incorrect string value: '\xF0\xA4\xAD\xA2' for column 'value' at row 1

(note that \xF0\xA4\xAD\xA2 isn't even close to the original value of \uD852\uDF62).

On the other hand, both

INSERT INTO unicode_test (value) VALUES (_utf16'');

and

INSERT INTO unicode_test (value) VALUES (_utf8mb4'');

succeed, but the inserted values are different from the original one.

My database uses the utf8mb4 character set, so I assume it should handle surrogates transparently.

What is the recommended way of inserting non-BMP characters into MySQL?

Bass
  • 4,977
  • 2
  • 36
  • 82

1 Answers1

1

Use CHARACTER SET utf8mb4, not utf8 or utf16.

See "Best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored It sounds like you don't have the connection parameter (or SET NAMES).

What you tried should have worked:

INSERT INTO unicode_test (value) VALUES ('');

Another approach is

INSERT INTO unicode_test (value) VALUES (UNHEX('F0A4ADA2'));
Rick James
  • 135,179
  • 13
  • 127
  • 222