With the following table, using mariadb-server-10.1 10.1.32+maria-1~trusty:
CREATE TABLE `tags` (
`tag_name` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL,
`thing_id` int(11) NOT NULL,
PRIMARY KEY (`thing_id`,`tag_name`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Manifests as (in SQLAlchemy app):
IntegrityError: (IntegrityError)
(1062, "Duplicate entry '1532-?' for key 'PRIMARY'")
at the second emoji insertion attempt.
The first one seems to be ok in the db (believe me, it shows right in my console and my browser as "alien" emoji):
> select tag_name, HEX(tag_name) from tags;
+----------+----------------+
| tag_name | HEX(tag_name) |
+----------+----------------+
| GOODGUY | 474F4F44475559 |
| | F09F91BD |
+----------+----------------+
2 rows in set (0.00 sec)
I am aware of Emoji's in mysql turns to question marks , but my.cnf has:
default-character-set = utf8mb4
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
default-character-set = utf8mb4
and utf8mb4 is being used in the client (charset added to connection string as ?charset=utf8mb4
). And I believe primary key is checked on the server side.
Is there anything else I am missing (can check) or is it some MariaDB bug or some additional configuration required?
not sure whether relevant or not, but the same problem when inserting via mysql
shell. Also tried this to see what is going on (found this in mysql utf8mb4_unicode_ci cause unique key collision ):
> SELECT ''='' COLLATE utf8mb4_unicode_ci;
+------------------------------------+
| '?'='?' COLLATE utf8mb4_unicode_ci |
+------------------------------------+
| 1 |
+------------------------------------+
but not sure whether this relevant.
I do not understand: database sometimes shows it alright (via both clients - shell and SQLAlchemy app), but then fails to show in the header? From the evidence I got, I do not understand where that bad conversion happens. The data seems to be ok in the database (see hex above), but two emojis equivalent for the primary key?
One more to contrast:
> SELECT ''='' COLLATE utf8mb4_bin;
+-----------------------------+
| '?'='?' COLLATE utf8mb4_bin |
+-----------------------------+
| 0 |
+-----------------------------+
This kind of points finger primary key does not use binary? All emojis converted to something else before being used in the index? Quite weird.