1

I have just received an SQL insert script, but it fails on a duplicate key entry:

I am trying to insert :

1)Montaña

2)Montana

My tables are all utf8_spanish2_ci,

can anybody explain why this is happening ?

Blottt
  • 119
  • 1
  • 13
  • Related: [MySQL Case Insensitive but Accent Sensitive UTF8 Unique Key](http://stackoverflow.com/q/11005647) – Pekka Jul 28 '12 at 15:47
  • Related (explaining why this happens): [MySQL diacritic insensitive search (spanish accents)](http://stackoverflow.com/q/3304464) – Pekka Jul 28 '12 at 15:47

3 Answers3

1

The utf8_spanish2_ci collation is indeed not only case insensitive, but also partly accent insensitive, so ñ = n. (as Joni Salonen points out, this is incorrect!) but á = a.

There is, as far as I know, no collation that does not come with this "feature" except utf8_bin.

What you can do:

  • Use a binary collation for the table (this comes with other problems, though)
  • Use a binary collation for the specific column (possibly your best bet)
  • Use a different column as the key column - are you sure that column needs to be a unique key in the first place? If this is for a primary key, wouldn't a numeric auto-increment be much better suited?
Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • If the column containing "Montana" and "Montaña" is supposed to have unique rows, the dbms needs to know that. Doesn't matter whether you add a numeric column as a surrogate key. A numeric surrogate without a unique constraint on *this* column would allow rows like `{1, Montana}`, `{2, Montana}`, `{365, Montana}`, and so on. – Mike Sherrill 'Cat Recall' Jul 28 '12 at 15:57
  • @Catcall yeah, but I'm not sure whether that really is the true requirement. This could be a case of the OP wrongly picking the name as the primary key instead of an auto increment. We will see – Pekka Jul 28 '12 at 15:59
  • Hi thanks for the answer, I ended up using varbinary as my column data type and utf8_spanish_ci as my character set ( since this is faster then ut8_bin ? ? ) I have an unique key on this column, my primary key is auto_increment – Blottt Jul 28 '12 at 16:24
  • 1. utf8_spanish2_ci does consider ñ distinct from n. 2. utf8_bin is a lot simpler than utf8_spanish_ci; I doubt it can be slower. – Joni Jul 29 '12 at 10:32
  • @Joni oh, I stand corrected! Thanks. Then it's really weird why this happens. Although seeing as `à` = `a` even in the spanish collations, this might become a problem later anyway – Pekka Jul 29 '12 at 10:42
0

This is most likely due to the collation considering Montaña and Montana to be identical.

(The collation determines the result of String comparisions.)

Andreas
  • 1,751
  • 2
  • 14
  • 25
0

Are you sure that this particular column has the collation utf8_spanish2_ci ?

The words Montana and Montaña are NOT equal according to this collation, as you can verify with this SQL:

mysql> select 'Montana' = 'Montaña' collate utf8_spanish2_ci as eq;
+----+
| eq |
+----+
|  0 |
+----+

Is it possible that somehow the collation is changed to utf8_general_ci? In this collation n and ñ are equal.

Joni
  • 108,737
  • 14
  • 143
  • 193