0

I want to store in unique column polish and german signs. When i alter database:

alter database osa character set utf8 collate utf8_general_ci;

I have a problem with german signs.

sql> insert into company(uuid, name) VALUE ("1","IDE")
[2016-11-27 10:37:35] 1 row affected in 13ms

sql> insert into company(uuid, name) VALUE ("2","IDĘ")
[2016-11-27 10:37:37] 1 row affected in 9ms

sql> insert into company(uuid, name) VALUE ("3","Schuring")
[2016-11-27 10:37:38] 1 row affected in 13ms

sql> insert into company(uuid, name) VALUE ("4","Schüring")
[2016-11-27 10:37:39] [23000][1062] Duplicate entry 'Schüring' for key 'UK_niu8sfil2gxywcru9ah3r4ec5'

Which collate I have to use?

Edit:

Also not works for utf8_unicode_ci

Rick James
  • 135,179
  • 13
  • 127
  • 222
Piotr Sobolewski
  • 2,024
  • 4
  • 28
  • 42

4 Answers4

2

The _ci in the COLLATION indicates "character insensitive". Unfortunately, it also means "accent insensitive". So to get E and Ę to be treated differently, you need a _bin collation -- either utf8_bin or utf8mb4_bin.

mb4 is needed for Emoji and Chinese, plus some obscure things.

Rick James
  • 135,179
  • 13
  • 127
  • 222
1

Replace all occurrences of utf8_general_ci with utf8_unicode_ci instead. utf8_general_ci is broken, apparently: What are the diffrences between utf8_general_ci and utf8_unicode_ci?

utf8_general_ci is a very simple — and on Unicode, very broken — collation, one that gives incorrect results on general Unicode text.

Community
  • 1
  • 1
Dai
  • 141,631
  • 28
  • 261
  • 374
  • 1
    Technically speaking, ut8_unicode_ci is broken, fixed by utf8_unicode_520_ci. With 8.0, there is an even newer standard: utf8mb4_unicode_900_ci. – Rick James Nov 28 '16 at 02:11
0

Maybe you should try utf8mb4_unicode_ci ?

Utf8 charset cannot store all utf8 characters.

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

Maciek Bryński
  • 163
  • 1
  • 1
  • 6
  • sql> insert into company(name) VALUE ("IDE") [2016-11-27 22:45:22] 1 row affected in 13ms sql> insert into company(name) VALUE ("IDĘ") [2016-11-27 22:45:24] [23000][1062] Duplicate entry 'IDĘ' for key 'UK_name' – Piotr Sobolewski Nov 27 '16 at 21:47
  • All European characters are covered in both character sets. – Rick James Nov 28 '16 at 02:10
0
alter database osa character set utf8mb4 COLLATE utf8mb4_bin;

Works for me. @Maciek Bryński thank you for your hint.

Piotr Sobolewski
  • 2,024
  • 4
  • 28
  • 42