mysql utf8 latin, fails on duplicate key entry

Question

I have just received an SQL insert script, but it fails on a duplicate key entry:

I am trying to insert :

1)Montaña

2)Montana

My tables are all utf8_spanish2_ci,

can anybody explain why this is happening ?

Related: [MySQL Case Insensitive but Accent Sensitive UTF8 Unique Key](http://stackoverflow.com/q/11005647) — Pekka, Jul 28 '12 at 15:47
Related (explaining why this happens): [MySQL diacritic insensitive search (spanish accents)](http://stackoverflow.com/q/3304464) — Pekka, Jul 28 '12 at 15:47

score 1 · Accepted Answer · edited May 23 '17 at 12:03

1

The utf8_spanish2_ci collation is indeed not only case insensitive, but also partly accent insensitive, so ~~ñ = n.~~ (as Joni Salonen points out, this is incorrect!) but á = a.

~~There is, as far as I know, no collation that does not come with this "feature" except utf8_bin.~~

What you can do:

Use a binary collation for the table (this comes with other problems, though)
Use a binary collation for the specific column (possibly your best bet)
Use a different column as the key column - are you sure that column needs to be a unique key in the first place? If this is for a primary key, wouldn't a numeric auto-increment be much better suited?

edited May 23 '17 at 12:03

Community

1
1

answered Jul 28 '12 at 15:51

Pekka

442,112
142
972
1,088

If the column containing "Montana" and "Montaña" is supposed to have unique rows, the dbms needs to know that. Doesn't matter whether you add a numeric column as a surrogate key. A numeric surrogate without a unique constraint on *this* column would allow rows like `{1, Montana}`, `{2, Montana}`, `{365, Montana}`, and so on. – Mike Sherrill 'Cat Recall' Jul 28 '12 at 15:57
@Catcall yeah, but I'm not sure whether that really is the true requirement. This could be a case of the OP wrongly picking the name as the primary key instead of an auto increment. We will see – Pekka Jul 28 '12 at 15:59
Hi thanks for the answer, I ended up using varbinary as my column data type and utf8_spanish_ci as my character set ( since this is faster then ut8_bin ? ? ) I have an unique key on this column, my primary key is auto_increment – Blottt Jul 28 '12 at 16:24
1. utf8_spanish2_ci does consider ñ distinct from n. 2. utf8_bin is a lot simpler than utf8_spanish_ci; I doubt it can be slower. – Joni Jul 29 '12 at 10:32
@Joni oh, I stand corrected! Thanks. Then it's really weird why this happens. Although seeing as `à` = `a` even in the spanish collations, this might become a problem later anyway – Pekka Jul 29 '12 at 10:42

score 0 · Answer 2 · answered Jul 28 '12 at 15:47

0

This is most likely due to the collation considering Montaña and Montana to be identical.

(The collation determines the result of String comparisions.)

answered Jul 28 '12 at 15:47

Andreas

1,751
2
14
25

score 0 · Answer 3 · answered Jul 29 '12 at 10:29

Are you sure that this particular column has the collation utf8_spanish2_ci ?

The words Montana and Montaña are NOT equal according to this collation, as you can verify with this SQL:

mysql> select 'Montana' = 'Montaña' collate utf8_spanish2_ci as eq;
+----+
| eq |
+----+
|  0 |
+----+

Is it possible that somehow the collation is changed to utf8_general_ci? In this collation n and ñ are equal.

mysql utf8 latin, fails on duplicate key entry

3 Answers3