15

I am performing MANY concurrent SQL INSERT statements which are colliding on a UNIQUE KEY constraint, even though I am also checking for existing records for the given key inside of a single transaction. I am looking for a way to eliminate, or minimize, the amount of collisions I am getting without hurting the performance (too much).

Background:

I am working on an ASP.NET MVC4 WebApi project which receives A LOT of HTTP POST requests to INSERT records. It gets about 5K - 10K requests a second. The project's sole responsibility is de-duplicating and aggregating records. It is very write heavy; it has a relatively small amount of read requests; all of which use a Transaction with IsolationLevel.ReadUncommitted.

Database schema

Here is the DB table:

CREATE TABLE [MySchema].[Records] ( 
    Id BIGINT IDENTITY NOT NULL, 
    RecordType TINYINT NOT NULL, 
    UserID BIGINT NOT NULL, 
    OtherID SMALLINT NULL, 
    TimestampUtc DATETIMEOFFSET NOT NULL, 
    CONSTRAINT [UQ_MySchemaRecords_UserIdRecordTypeOtherId] UNIQUE CLUSTERED ( 
        [UserID], [RecordType], [OtherID] 
    ), 
    CONSTRAINT [PK_MySchemaRecords_Id] PRIMARY KEY NONCLUSTERED ( 
        [Id] ASC 
    ) 
) 

Repository Code

Here is the code for the Upsert method which is causing the Exception:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using Dapper;

namespace MyProject.DataAccess
{
    public class MyRepo
    {
        public void Upsert(MyRecord record)
        {
            var dbConnectionString = "MyDbConnectionString";
            using (var connection = new SqlConnection(dbConnectionString))
            {
                connection.Open();
                using (var transaction = connection.BeginTransaction(IsolationLevel.ReadCommitted))
                {
                    try
                    {
                        var existingRecord = FindByByUniqueKey(transaction, record.RecordType, record.UserID, record.OtherID);

                        if (existingRecord == null)
                        {
                            const string sql = @"INSERT INTO [MySchema].[Records] 
                                                 ([UserID], [RecordType], [OtherID], [TimestampUtc]) 
                                                 VALUES (@UserID, @RecordType, @OtherID, @TimestampUtc) 
                                                 SELECT CAST(SCOPE_IDENTITY() AS BIGINT";
                            var results = transaction.Connection.Query<long>(sql, record, transaction);
                            record.Id = results.Single();
                        }
                        else if (existingRecord.TimestampUtc <= record.TimestampUtc)
                        {
                            // UPDATE
                        }

                        transaction.Commit();
                    }
                    catch (Exception e)
                    {
                        transaction.Rollback();
                        throw e;
                    }
                }
            }
        }

        // all read-only methods use explicit transactions with IsolationLevel.ReadUncommitted

        private static MyRecord FindByByUniqueKey(SqlTransaction transaction, RecordType recordType, long userID, short? otherID)
        {
            const string sql = @"SELECT * from [MySchema].[Records] 
                                 WHERE [UserID] = @UserID
                                 AND [RecordType] = @RecordType
                                 AND [OtherID] = @OtherID";
            var paramz = new {
                UserID = userID,
                RecordType = recordType,
                OtherID = otherID
            };
            var results = transaction.Connection.Query<MyRecord>(sql, paramz, transaction);
            return results.SingleOrDefault();
        }
    }

    public class MyRecord
    {
        public long ID { get; set; }
        public RecordType RecordType { get; set; }
        public long UserID { get; set; }
        public short? OtherID { get; set; }
        public DateTimeOffset TimestampUtc { get; set; }
    }

    public enum RecordType : byte
    {
        TypeOne = 1,
        TypeTwo = 2,
        TypeThree = 3
    }
}

The Problem

When the server is under heavy enough load, I am seeing many of these Exceptions occurring:

Violation of UNIQUE KEY constraint 'UQ_MySchemaRecords_UserIdRecordTypeOtherId'. Cannot insert duplicate key in object 'MySchema.Records'. The duplicate key value is (1234567890, 1, 123). The statement has been terminated.

This Exception occurs often, as many as 10 times in a minute.

What I have tried

  • I tried changing the IsolationLevel to Serializable. The Exception occured much less often but still occured. Also, the performance of the code suffered greatly; the system could only handle 2K requests a second. I suspect that this decrease in throughput was actually the cause of the reduced Exceptions so I concluded that this didn't solve my problem.
  • I have considered using the UPDLOCK Table Hint but I don't fully understand how it cooperates with isolation levels or how to apply it to my code. It does seem like it might be the best solution though, from my current understanding.
  • I also tried adding the initial SELECT statement (for existing records) to be part of the INSERT statement, like shown here but this attempt still had the same problem.
  • I tried implementing my Upsert method by using the SQL MERGE statement but this also suffered from the same problem.

My Question(s)

  • Is there anything I can do to prevent this type of UNIQUE key constraint collisions?
  • If I should be using the UPDLOCK table hint (or any other table hint for that matter), how would I add that to my code? Would I add it to the INSERT? The SELECT? Both?
Community
  • 1
  • 1
Jesse Webb
  • 43,135
  • 27
  • 106
  • 143
  • 5
    If two people try to log "identical" rows, do you just want one to win? Does it matter if the rows don't get inserted immediately? I can envision using a queue table and a background job that eliminates any dupes before inserting en masse. The biggest problem you have is that you are trying to do 10K individual inserts per second with individual insert statements and all of the connection and .net transaction scaffolding and overhead that goes with it. Maybe the app should hold onto sets of values, e.g. using table-valued parameters, instead of performing singleton inserts... – Aaron Bertrand Dec 06 '13 at 20:05
  • @AaronBertrand If two people try to `INSERT` "identical" rows, I expect only 1 record to exist in the end. If a newer record comes later, I expect an `UPDATE` to occur; there are actually more columns in the table, and I update those columns if the `TimestampUtc` value is newer than an existing record. The current behavior is actually correct right now but the amount of Exceptions I am getting seems less than ideal. I was hoping to find a way to have the same end result as I have now, without getting so many errors. – Jesse Webb Dec 06 '13 at 20:10
  • 1
    These links might be useful just to contribute to thought processes: http://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/ http://www.sqlperformance.com/2012/08/t-sql-queries/error-handling – Aaron Bertrand Dec 06 '13 at 20:17
  • 2
    Based on the comments here I would suggest that you likely have an additional issue which isn't throwing an exception: The case where an update is performed on a record when it shouldn't because another record with a higher timestamp has already been used to update. You can manage that by adding the timestamp condition to the where clause on your update if you're not doing that already. – Carth Dec 06 '13 at 20:32
  • @Carth I never even thought of that. I do compare the `TimestampUtc` values in code before executing the `UPDATE` but my SQL doesn't include that check in it's condition. It is very likely that I am overriding newer data with older data. Good catch! I will fix that now... – Jesse Webb Dec 06 '13 at 20:36
  • You can also consider using a key value store in front of your db like redis which is designed for high concurrent read/writes. It generates IDs much faster than a traditional rdbms – Sam Dec 06 '13 at 20:37
  • As a side point, with this volume, isn't (int) value small for PK? – NoChance Dec 06 '13 at 21:34
  • @EmmadKareem Yes, it is indeed. This is a contrived example based on the real code. The real code uses a `BIGINT` ID column, I just got it wrong here. Fixed. Thanks! – Jesse Webb Dec 06 '13 at 21:37
  • How can two people insert identical rows? Isn't UserId not part of the unique key constraint? Is the same person running the same insert twice? Or is "UserId" here not representative of the person doing the insert? – sam yi Dec 06 '13 at 22:05
  • Sounds like you need a key-range lock. Serializable isolation would add that for you. You might also get away with just an UPDLOCK (which is a good idea anyway here to prevent deadlocks) – Marc Gravell Dec 06 '13 at 23:15
  • @samyi The UserID is not representing who is doing the INSERT, it represents who the record is about. Hope that makes sense... – Jesse Webb Feb 04 '14 at 21:29

3 Answers3

3

Make the validating read take a lock:

FROM SomeTable WITH (UPDLOCK, ROWLOCK, HOLDLOCK)

This serializes accesses on a single key, allowing for concurrency on all others.


HOLDLOCK ( = SERIALIZABLE) protects a range of values. This ensures a row that doesn't exist continues to not exist so the INSERT succeeds.

UPDLOCK ensures any existing row is not changed or deleted by another concurrent transaction so the UPDATE succeeds.

ROWLOCK encourages the engine to take a row-level lock.

These changes may increase the chances of a deadlock.

usr
  • 168,620
  • 35
  • 240
  • 369
  • I assume this should be added to the `SELECT` statement? Also, you say that this serializes accesses on a single key... does that have to be my PK or will my UNIQUE key work too? I will definitely try this out... – Jesse Webb Dec 06 '13 at 21:24
  • Yes the select. The lock will be taken on any row being read. In practice this will be the CI row and the index row being used to find it. – usr Dec 06 '13 at 21:25
1

It may be faster to permit and suppress the errors in your scenario than to attempt to eliminate them. If you're consolidating multiple sources synchronously with overlapping data you will need to create a bottleneck somewhere to manage the race condition.

You could create a singleton manager class that held the unique constraints of the records in a hashset so you would automatically drop duplicates when they're added to the set. Records get added prior to submitting to the DB and removed upon statement completion. That way either the hashset eats the duplicate or the existing record check you do at the top of your try detects the committed duplicate record.

Carth
  • 2,303
  • 1
  • 17
  • 26
  • Thanks for your input! "It may be faster to permit and suppress the errors" This is my current impression; everything else I try seems to slow **everything** down, where as my current code only fails on 10 records each minute while processing 5K records a second. Unfortunately, the "singleton manager class" (or any in-memory) solution is not an option, this code runs on a cluster of servers. If I wanted to do something like that, I would likely have to implement some kind of external queue like @AaronBertrand suggested. – Jesse Webb Dec 06 '13 at 20:31
  • 1
    @JesseWebb Definitely not the most elegant solution but for 10 times in a minute I might suggest just catching the exception like you're doing now and then resubmitting it to the Upsert...once – Carth Dec 06 '13 at 20:45
  • 1
    That is actually what my DBA suggested too: Eat the exception (because it is expected), retry once, then throw if it still fails. I was avoiding this route but I might have to give it a try... – Jesse Webb Dec 06 '13 at 21:00
0

AFAIK, the only solution is to check for duplication before insert. It demands at least one round-trip to DB results in poor performance.

You can do SELECT on a table and hold the lock to prevent other parallel threads to SELECT and getting the same value. Here is the detailed solution: Pessimistic locking in EF code first

PS: Based on Aron's comment and it's nice work-around, I should say my proposed solution is based on this assumption that you don't want to use buffer or queue.

Community
  • 1
  • 1
Alireza
  • 10,237
  • 6
  • 43
  • 59
  • My code already does a `SELECT` to check for existing records before doing an `INSERT`. I am not using EntityFramework, I am using dapper. Is there a way to do 'Pessimistic locking' with Dapper or straight SQL? – Jesse Webb Dec 06 '13 at 20:18
  • 1
    Whatever you use, EF, Dapper, nativity SQL, or even stored procedure you can do SELECT on a table and hold the lock. I guess your problem is that between the time you check for a duplication and the time you insert, another record(s) comes in and gets inserted faster causing race. The solution: lock! – Alireza Dec 06 '13 at 20:21