11

My issue: inserting a set of data works on my local machine/MySQL database, but on production it causes a Duplicate entry for key 'PRIMARY' error. As far as I can tell both setups are equivalent.

My first thought was that it's a collation issue, but I've checked that the tables in both databases are using utf8_bin.

The table starts out empty and I am doing .Distinct() in the code, so there shouldn't be any duplicate entries.

The table in question:

CREATE TABLE `mytable` (
  `name` varchar(100) CHARACTER SET utf8 NOT NULL,
  `appid` int(11) NOT NULL,
  -- A few other irrelevant fields
  PRIMARY KEY (`name`,`appid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Database.cs:

[DbConfigurationType(typeof(MySql.Data.Entity.MySqlEFConfiguration))]
public class Database : DbContext
{
    public DbSet<MyTable> MyTable { get; set; }
    public static Database Get()
    {
        /* Not important */
    }
    //etc.
}

MyTable.cs:

[Table("mytable")]
public class MyTable : IEquatable<MyTable>, IComparable, IComparable<MyTable>
{
    [Column("name", Order = 0), Key, Required, DatabaseGenerated(DatabaseGeneratedOption.None)]
    public string Name
    {
        get { return _name; }
        set { _name = value.Trim().ToLower(); }
    }

    private string _name;

    [Column("appid", Order = 1), Key, Required, DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int ApplicationId { get; set; }

    //Equals(), GetHashCode(), CompareTo(), ==() etc. all auto-generated by Resharper to use both Name and ApplicationId.
    //Have unit-tests to verify they work correctly.
}

Then using it:

using(Database db = Database.Get())
using(DbContextTransaction transaction = db.Database.BeginTransaction(IsolationLevel.ReadUncommitted))
{
    IEnumerable<MyTable> newEntries = GetNewEntries();
    //Verify no existing entries already in the table; not necessary to show since table is empty anyways
    db.MyTable.AddRange(newEntries.Distinct());
}

I'm at a loss how there could be duplicate entries in the database after doing a .Distinct() in the code, when using utf8_bin, especially since it works on one machine but not another. Does anyone have any ideas?

jhoepken
  • 1,842
  • 3
  • 17
  • 24
BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
  • A related question about `Distinct`: [Distinct not working with LINQ to Objects](http://stackoverflow.com/q/1365748/4519059) ;). – shA.t Oct 08 '16 at 08:00
  • 4
    Maybe related to the current culture of machines this code runs on. String equality comparisions depend on current culture, so Distinct can produce different results on different servers. Then, names that are considered different by .NET with current culture are considered the same by mysql and you get violation. So, try to check language settings of both machines, and if that is indeed the problem - use same culture in string comparisions. – Evk Oct 08 '16 at 09:38
  • Can you give an example of the uncommitted values? from both the machines. – Som Shekhar Oct 08 '16 at 10:33
  • @Evk Yes I verified both machines are using the same `CultureInfo.CurrentCulture`. @Som an example string that is considered a duplicate only on the server: `autograph: 啸天` – BlueRaja - Danny Pflughoeft Oct 08 '16 at 19:16
  • @evk the culture is `en-us`. What you said _(edit: and now deleted)_ shouldn't be possible because A. The MySQL table/column are using `utf8_bin` collation, so two strings equal only when they are _exactly_ the same, and B. I don't see the issue when testing locally, despite having the same culture. – BlueRaja - Danny Pflughoeft Oct 08 '16 at 19:26
  • 1
    Does the number of items returned by newEntries.Distinct() before you are calling AddRange the same on both machines? – Evk Oct 08 '16 at 19:47
  • Your `distinct` is running on .NET environment and not in database. May be both machines are working in different character set. Can you execute `Distinct().Count()` in both machines and post results as @Evk is suggesting. It will help to isolate the issue? – dani herrera Oct 10 '16 at 07:21
  • @BlueRaja-DannyPflughoeft Please indicate the MySQL server version number on both machines. – Marc Alff Oct 11 '16 at 07:45
  • @BlueRaja-DannyPflughoeft Instead of adding rows with `.AddRange()` could you loop through every record before Inserting to check if that record already exists? That might give us a better insight if we can understand which row actually causes the error. Also, is there any chance of zero-width spaces being used in data? `.Trim()` prolly wouldn't work on those and thus `.Distinct()` won't help either. – uTeisT Oct 11 '16 at 08:26

4 Answers4

1

I would investigate the following points:

  • check the exact MySQL version on both machines. You can do this in a MySQL client with SHOW VARIABLES LIKE "%version%";
  • use SELECT HEX(name) to see how the data is encoded, for the duplicate row.
  • investigate if the "utf8" data is stored in utf8mb3 or utf8mb4

https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb3.html

https://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8mb4.html

Assuming that on the dev machine, data is inserted from a fresh install, and that on the production machine, data could have been inserted with an older version and then the server upgraded, I would check in particular if actions are needed (and were done) to complete correctly the upgrade process.

In particular, see these sections (for the proper version) in the MySQL reference manual:

https://dev.mysql.com/doc/refman/5.7/en/checking-table-incompatibilities.html

If necessary, rebuild the indexes.

https://dev.mysql.com/doc/refman/5.7/en/rebuilding-tables.html

Edit (2016-10-12)

All the above focuses on the table, and on storage.

Another part to check is the connection between client and server, with variables such as character_set_connection.

Please check all character set related system variables, to compare them.

mysql> show variables like "%character%";
+--------------------------+-------------------------------------------+
| Variable_name            | Value                                     |
+--------------------------+-------------------------------------------+
| character_set_client     | utf8                                      |
| character_set_connection | utf8                                      |
| character_set_database   | latin1                                    |
| character_set_filesystem | binary                                    |
| character_set_results    | utf8                                      |
| character_set_server     | latin1                                    |
| character_set_system     | utf8                                      |
| character_sets_dir       | /home/malff/GIT_TRUNK/sql/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.02 sec)
Marc Alff
  • 8,227
  • 33
  • 59
0

I ended up solving it by unicode-escaping non-ascii characters, similar to this solution.

However, I still have no idea why this could have possibly happened...

Community
  • 1
  • 1
BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
  • May be because in ASCII "qwerty????" is same as "qwerty????", when in UTF it's a different strings like "qwertyАБВГ" and "qwertyЙЦУК"? – gaRex Oct 14 '16 at 20:46
0

Are both machines using the same database drivers? I have had similar issues with EF and Oracle when different drivers are installed.

EDIT:

This document

https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

says that before MySQL 5.5.3 Unicode characters were stored using a max of 3 bytes per character, and after were stored using 4 bytes. This could explain the difference in whether a key is determined to be unique. The difference would come into play when 'supplementary characters' are used as older versions could not store those characters at all.

Are your two databases on different sides of the divide (MySQL 5.5.3)?

John Meyer
  • 2,296
  • 1
  • 31
  • 39
  • Same driver, but different database versions. Maybe it is a MySQL bug that was fixed? – BlueRaja - Danny Pflughoeft Oct 10 '16 at 17:26
  • In my case running the Oracle 12 driver on one machine and the Oracle 11 driver on another against Oracle 11g caused differences in the way numbers were processed. One setup returned an int, where the other returned a long. If possible, you should run the same version on your local and in the test/prod environments. – John Meyer Oct 10 '16 at 21:01
  • Re edit: According to that link the 4-byte UTF8 characters are stored using a new character set, `utf8mb4`. They did not, as you say, change the `utf8` character-set that both of my databases are using. – BlueRaja - Danny Pflughoeft Oct 15 '16 at 01:47
  • It says: "... **As of MySQL 5.5.3** , the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters: ... **For a supplementary character, utf8 cannot store the character at all**, whereas utf8mb4 requires four bytes to store it. Because utf8 cannot store the character at all, you have no supplementary characters in utf8 columns and need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL." – John Meyer Oct 17 '16 at 13:45
  • Yes but I am not using `utf8mb4` at all. Both databases use `utf8_bin`, and the data gets inserted correctly into one of them. – BlueRaja - Danny Pflughoeft Oct 17 '16 at 21:11
-1

I think it may be because certain characters have different meanings based on the environment so it's generally recommended that you escape these special characters before a string is used as a data value.

Check this out: http://dev.mysql.com/doc/refman/5.7/en/string-literals.html#character-escape-sequences

JuanR
  • 7,405
  • 1
  • 19
  • 30
  • ? There is no situation where this could be a problem. The EF driver should transparently escape any characters that would otherwise cause an issue. – BlueRaja - Danny Pflughoeft Oct 14 '16 at 18:44
  • It should. Now, does it? I would suggest you try a couple of things, just to rule it out. 1) Do a manual diff on your key and 2) Look for special characters in your key. See if you can find some then run a quick test using those values and see if it complains. – JuanR Oct 14 '16 at 18:47