Why do they cause unique key collision? Aren't they different character in utf8mb4?
You are missing the point about CHARACTER SET
and COLLATION
. A CHARACTER SET
is a collection of different characters. A COLLATION
says whether to treat the characters as equal -- think A
and a
-- different characters, but treated for ORDER BY
and WHERE =
, etc as being the same.
mysql> SELECT 'K'='K' COLLATE utf8_unicode_ci;
+-----------------------------------+
| 'K'='K' COLLATE utf8_unicode_ci |
+-----------------------------------+
| 1 |
+-----------------------------------+
So in utf8_unicode_ci (or utf8mb4_unicode_ci), those two characters are considered to be "equal".
"Equal" is the test for UNIQUE
keys.
Set the COLLATION
for the column to whatever makes sense for you.
- utf8mb4_unicode_ci for good 'real life' comparisons, apparently including this one. K=k=Ķ=ķ
- utf8mb4_unicode_ci for more simple-minded comparisons. In particular no 2-character combinations match 1-character encodings. Case folding and accent stripping does occur. K=k=Ķ=ķ
- utf8mb4_bin blindly checks the bits. No case folding, etc. K k Ķ ķ are all unequal.
utf8mb4_latvian_ci is a little different: K=k but not equal to Ķ=ķ . There are other specialized collations for other languages (mostly Western European).
Your K
is called "FULLWIDTH LATIN CAPITAL LETTER K", so it is quite reasonable that it compare equal to Latin K
.