What is an efficient way to handle inserts of unique "immutable" entities by multiple producers with optimistic concurrency approach?

Question

Assume a system with multiple concurrent producers that each strives to persist some graph of objects with the following common entities uniquely identifiable by their names:

CREATE TABLE CommonEntityGroup(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL
);
GO

CREATE UNIQUE INDEX IX_CommonEntityGroup_Name 
    ON CommonEntityGroup(Name)
GO


CREATE TABLE CommonEntity(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL,
    CommonEntityGroupId INT NOT NULL,
    CONSTRAINT FK_CommonEntity_CommonEntityGroup FOREIGN KEY(CommonEntityGroupId) 
        REFERENCES CommonEntityGroup(Id)
);
GO

CREATE UNIQUE INDEX IX_CommonEntity_CommonEntityGroupId_Name 
    ON CommonEntity(CommonEntityGroupId, Name)
GO

For example, producer A saves some CommonEntityMeetings, while producer B saves CommonEntitySets. Either of them has to persist CommonEntitys related to their particular items.

Basically, the key points are:

There are independent producers.
They operate concurrently.
Theoretically(though that may change and is not yet exactly true now) they will operate through the same Web Service (ASP.Net Web API), just with their respective endpoints/"resources". So ideally proposed solution should not rely on that.
They strive to persist different graphs of objects that contain possibly not yet existing CommonEntity/CommonEntityGroup objects.
CommonEntity/CommonEntityGroup are immutable once created and will never be modified or removed thereafter.
CommonEntity/CommonEntityGroup are unique according to some of their properties (Name and related common entity if any(e.g. CommonEntity is unique by CommonEntity.Name+CommonEntityGroup.Name)).
Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name.
There is a certain possibility that producers will try to create the same CommonEntity/CommonEntityGroup at the same time.
Though it is much more likely that such CommonEntity/CommonEntityGroup objects will already exist in db.

So, with Entity Framework(database first, though it probably doesn't matter) as DAL and SQL Server as storage what is an efficient and reliable way to ensure that all those producers will successfully persist their intersecting object graphs at the same time?

Taking into account that UNIQUE INDEX already ensures that there won't be duplicate CommonEntities (Name, GroupName pair is unique) I can see the following solutions:

Ensure that each CommonEntity/CommonGroupEntity is found/created+SaveChanged() before building the rest of the object's graph.

In such a case when SaveChanges is called for related entities there won't be any index violations due to other producers creating the same entities a moment before.

To achieve it I will have some

public class CommonEntityGroupRepository // sort of
{
    public CommonEntityGroupRepository(EntitiesDbContext db) ...

    // CommonEntityRepository will use this class/method internally to create parent CommonEntityGroup.
    public CommonEntityGroup FindOrCreateAndSave(String groupName)
    {
        return
            this.TryFind(groupName) ?? // db.FirstOrDefault(...)
            this.CreateAndSave(groupName);
    }

    private CommonEntityGroup CreateAndSave(String groupName)
    {
        var group = this.Db.CommonEntityGroups.Create();
        group.Name = groupName;
        this.Db.CommonGroups.Add(group)

        try
        {
            this.Db.SaveChanges();
            return group;
        }
        catch (DbUpdateException dbExc)
        {
            // Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)
            return this.Find(groupName); // TryFind that throws exception.
        }
    }
}

With this approach there will be multiple calls to SaveChanges and each CommonEntity will have its own sort of a Repository, though it seems to be the most reliable solution.

Just create the entire graph and rebuild it from scratch if Index violations occur

A bit ugly and inefficient (with 10 CommonEntities we may have to retry it 10 times), but simple and more or less reliable.

Just create the entire graph and replace duplicate entries if Index violations occur

Not sure that there is an easy and reliable way to replace duplicate entries in more or less complex object graphs, though both case specific and more generic reflection-based solution can be implemented.

Still, like a previous solution it may require multiple retries.

Try to move this logic to database (SP)

Doubt that it will be any easier to handle inside stored procedure. It will be the same optimistic or pessimistic approaches just implemented on database side.

Though it may provide better performance(not an issue in this case) and put the insertion logic into one common place.

Using SERIALIZABLE isolation level/TABLOCKX+SERIALIZABLE table hint in Stored Procedure - it should definitely work, but I'd prefer not to lock the tables exclusively more than it is actually necessary, because actual race is quite rare. And as it is already mentioned in the title, I'd like to find some optimistic concurrency approach.

I would probably try the first solution, but perhaps there are better alternatives or some potential pitfalls.

You can use insert or update statement (so called "upsert"). As I remember, in sql server MERGE statement is used for this purpose. With it you can insert or update all your CommonEntities in one query (though obviously with EF you can't - have to use raw sql). — Evk, Dec 29 '16 at 22:05
@Evk Thanks, I completely forgot about the MERGE statement. Perhaps you can add it as an answer? — Eugene Podskal, Dec 30 '16 at 07:56
check out http://blog.brentmckendrick.com/introducing-graphdiff-for-entity-framework-code-first-allowing-automated-updates-of-a-graph-of-detached-entities/ really easy way to persist graphs. — solidau, Dec 30 '16 at 23:16
@SeanReeves Interesting, will look at it after holidays. Just I am not sure that it will handle concurrency issues, though I may be wrong. If I am wrong on that account, then perhaps you can prove it and add that as an answer? — Eugene Podskal, Dec 31 '16 at 08:02
I think Transactions are designed to solve your problems, you could use Serializable transaction to ensure 100% ACID compliance, any conflicts would be handled and transactions will be rolled back automatically. All you need to do is, wrap your entire logic in TransactionScope, however, running long transactions will cause deadlocks but what I would do is, insert items and do extra updates later on. — Akash Kava, Dec 31 '16 at 11:57
You haven't shown in the question how exactly the common entities and groups are used within the rest of the graph of objects. In essence, do other tables in your database reference `CommonEntity.id` or `CommonEntity.Name`? When you try to save the graph of objects, do you know the `Name` or `ID` of `CommonEntity` and `CommonEntityGroup`? In other words, when you save the graph of objects do you need to look up the entity's `ID` by its `Name` first? — Vladimir Baranov, Dec 31 '16 at 14:11
@VladimirBaranov I have added that info to the question - Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name. — Eugene Podskal, Dec 31 '16 at 14:27
@AkashKava I've mentioned that SERIALIZABLE isolation level is a possible approach in solution #5. It is just that such table locking seems excessive when we won't have any contention between producers and there exists at least one optimistic concurrency approach. — Eugene Podskal, Dec 31 '16 at 14:31
@EugenePodskal How many executions of this _per second_ are you expecting? How many entries will a producer have to save, on average? Will a procuder need to save multiple `CommonEntityGroup` records and/or multiple `CommonEntity` records? Or are these values singular per operation? — Solomon Rutzky, Jan 02 '17 at 16:28

score 4 · Accepted Answer · edited Apr 13 '17 at 12:42

Table Valued Parameters

One option is to use table valued parameters instead of individual calls to the database.

Example procedure using a table valued parameter:

create type dbo.CommonEntity_udt as table (
    CommonEntityGroupId int not null
  , Name      nvarchar(100) not null
  , primary key (CommonEntityGroupId,Name)
    );
go

create procedure dbo.CommonEntity_set (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      insert dbo.CommonEntity (Name)
        select s.Name
          from @CommonEntity as s
          where not exists (
            select 1 
              from dbo.CommonEntity as t with (updlock, holdlock)
              where s.Name = t.Name
                and s.CommonEntityGroupId = t.CommonEntityGroupId
              );
    end;
end;
go

Testing an Insert for Race Conditions with Ostress.exe - Kendra Little

table valued parameter reference:

I don't recommend merge unless there is a compelling argument for it. This situation is only looking at inserting, so it seems like overkill.

Example merge version with table valued parameter:

create procedure dbo.CommonEntity_merge (
    @CommonEntity dbo.CommonEntity_udt readonly
) as
begin;
  set nocount on;
  set xact_abort on;
  if exists (
    select 1 
      from @CommonEntity as s
        where not exists (
          select 1 
            from dbo.CommonEntity as t
            where s.Name = t.Name
              and s.CommonEntityGroupId = t.CommonEntityGroupId
            ))
    begin;
      merge dbo.CommonEntity with (holdlock) as t
      using (select CommonEntityGroupId, Name from @CommonEntity) as s
      on (t.Name = s.Name
        and s.CommonEntityGroupId = t.CommonEntityGroupId)
      when not matched by target
        then insert (CommonEntityGroupId, Name) 
        values (s.CommonEntityGroupId, s.Name);
    end;
end;
go

merge reference:

ignore_dup_key code comment:

// Check that it was Name Index violation (perhaps make indices IGNORE_DUP_KEY)

ignore_dup_key is going to use serializable behind the the scenes; potentially costly overhead on non-clustered indexes; and even when the index is clustered, can have significant costs depending on the amount of duplicates.

This can be handled in the stored procedures using Sam Saffron's upsert (update/insert) pattern, or one of the patterns shown here: Performance impact of different error handling techniques - Aaron Bertrand.

Thanks, that's quite a lot of interesting information and links. — Eugene Podskal, Dec 31 '16 at 14:51
Can you just structure it more like an answer and not like a proof of a lack of my question writing skill :)? I mean by moving `~~`s down to make it easier to read.~~ — Eugene Podskal, Dec 31 '16 at 14:59
Yes, and if you move "Table Valued Parameters" to the top it would be even better. I won't award the bounty for now just in case, though it is a bit doubtful that much better answer will come. — Eugene Podskal, Dec 31 '16 at 15:11
Are you sure that this is optimistic concurrency approach? You are locking the table `with (updlock, holdlock)`... And you need to have retry logic in any case, because two processes can run the `exists` check at the same time and then try to insert the the same `Name`. — Vladimir Baranov, Dec 31 '16 at 15:39
If all of the names in the table valued parameter exist in the table, no locks are taken. The `not exists () [...] with (updlock, holdlock)` uses [Key-Range Locking](https://technet.microsoft.com/en-us/library/ms191272(v=sql.105).aspx) to prevent two parallel processes from inserting the same name. The `merge` version uses `with (holdlock)` because it already includes `updlock` in its default behavior, and again they are using [Key-Range Locking](https://technet.microsoft.com/en-us/library/ms191272(v=sql.105).aspx), not exclusive table locks. — SqlZim, Dec 31 '16 at 17:50
@SqlZim, I think I am confused by running `if exists` first and then running the same subquery again in the `INSERT`. I would expect **optimistic** approach to just do what it needs to do without checking first and then retrying if the action didn't succeed. Some retrying logic has to be there in any case. — Vladimir Baranov, Dec 31 '16 at 23:34
@VladimirBaranov This answer suggests using **table valued parameters** (TVPs) to minimize calls to the database. From [Optimistic Concurrency (OC) - MSDN](https://msdn.microsoft.com/en-us/library/aa0416cz(v=vs.110).aspx): "[...] do not lock a row when reading it" is true for my examples; the first `not exists()` does not acquire locks. Existing rows are **immutable**, so the OC versioning/detecting violations doesn't apply. The second `not exists ()` acquires key-range locks to prevent other operations from inserting the same names, and to prevent inserting duplicates from the TVP. — SqlZim, Jan 01 '17 at 14:22

score 2 · Answer 2 · edited Aug 05 '20 at 03:23

2

Well choosing the approach will certainly depend on the type of functionality & amount of data that both procedures will be using.

If we go with the first approach, then certainly for each of the SaveChanges() call, the Entity Framework will put a transaction. This could reduce the performance a bit in case of large number of records.

If there is a considerable amount of records which needs to be inserted/updated, then I will surely go with the Stored procedure based approach. With this approach, you will have a full control on the database & querying for the record to check if it exists will be very easy (though some fine tuning may be required here). I don't see if there would be any challenges implementing the same with stored procedures. With few implementation optimizations like loading the data into temporary tables (not SQL temp tables, but physical tables which can be used to store data temporarily), this can be further enhanced to have full information log the stored procedure has processed.

edited Aug 05 '20 at 03:23

Nick

138,499
22
57
95

answered Dec 29 '16 at 13:55

Mads...

391
2
11

So you advocate not trying to implement some sort of an optimistic concurrency insert, but rather stick to approach #5 (SERIALIZABLE isolation level/TABLOCKX+SERIALIZABLE table hint in Stored Procedure)? And to do it inside Stored Procedure? – Eugene Podskal Dec 29 '16 at 17:46
What do you think about the first approach implemented in SP - like https://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/ taking into account that I have [added a note](http://stackoverflow.com/revisions/41336147/4) that most unique entities will usually already be in DB? – Eugene Podskal Dec 29 '16 at 17:46
I would still prefer to go with the stored procedure based approach, though the first approach with stored procedure sounds good, however I would advice to keep all of your logic at a single place. Though most unique entities will usually be in the database, still I would not recommend a round trip from EF just to check that, as the same can be done at database level itself. The point I am trying to make here is that, with a large record set in place, let the database play around with it to get maximum out of it. – Mads... Dec 30 '16 at 05:26

score 2 · Answer 3 · edited May 23 '17 at 11:46

Based on your last key point another solution is to move you "Creation" logic to a central application server/service (See update 2) that has a queue users can use to "add" records.

Since most of your records already exist,if you use some sort of caching, you should be able to make this quite efficient

Now,about the number a records.
You have to keep in mind the EF was not designed to support "bulk" operations therefore,creating thousands of records will be (really really) slow.

I have used 2 solutions that help you and a huge number of records very fast 1)EntityFramework.BulkInsert
2)SqlBulkCopy

Both are extremely easy to use

Also,I hope you've already seen Fastest Way of Inserting in Entity Framework

Update
Below is another solution that I've used twice recently
Instead of saving your record when a user performs a "Save",schedule it to happen X seconds later.
If in the meantime someone else trying to save the same record,just "slide" the Scheduled Date.

Below you can see a sample code that tries to save the same record 10 times(at the same time) but the actual save only happens once.

The actual result can be seen here:

using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;

namespace ConsoleApplicationScheduler
{
    class Program
    {
        static void Main(string[] args)
        {
            ConcurrentSaveService service = new ConcurrentSaveService();
            int entity = 1;
            for (int i = 0; i < 10; i++)
            {
                //Save the same record 10 times(this could be conrurrent)
                service.BeginSave(entity);
            }

            Console.ReadLine();
        }
    }

    public class ConcurrentSaveService
    {
        private static readonly ConcurrentDictionary<int, DateTime> _trackedSubjectsDictionary = new ConcurrentDictionary<int, DateTime>();

        private readonly int _delayInSeconds;

        public ConcurrentSaveService()
        {
            _delayInSeconds = 5;
        }

        public async void BeginSave(int key)
        {
            Console.WriteLine("Started Saving");
            DateTime existingTaskDate;
            _trackedSubjectsDictionary.TryGetValue(key, out existingTaskDate);

            DateTime scheduledDate = DateTime.Now.AddSeconds(_delayInSeconds);
            _trackedSubjectsDictionary.AddOrUpdate(key, scheduledDate, (i, d) => scheduledDate);

            if (existingTaskDate > DateTime.Now)
                return;

            do
            {
                await Task.Delay(TimeSpan.FromSeconds(_delayInSeconds));

                DateTime loadedScheduledDate;
                _trackedSubjectsDictionary.TryGetValue(key, out loadedScheduledDate);
                if (loadedScheduledDate > DateTime.Now)
                    continue;

                if (loadedScheduledDate == DateTime.MinValue)
                    break;

                _trackedSubjectsDictionary.TryRemove(key, out loadedScheduledDate);

                if (loadedScheduledDate > DateTime.MinValue)
                {
                    //DoWork
                    Console.WriteLine("Update/Insert record:" + key);
                }

                break;
            } while (true);

            Console.WriteLine("Finished Saving");
        }
    }
}

Update 2 Since you can control the "creation" process in your WebAPI app you should be able to avoid duplicate using some sort of cache like in the following pseudocode

using System.Collections.Concurrent;
using System.Web.Http;

namespace WebApplication2.Controllers
{
    public class ValuesController : ApiController
    {
        static object _lock = new object();
        static ConcurrentDictionary<string, object> cache = new ConcurrentDictionary<string, object>();
        public object Post(InputModel value)
        {
            var existing = cache[value.Name];
            if (existing != null)
                return new object();//Your saved record

            lock (_lock)
            {
                existing = cache[value.Name];
                if (existing != null)
                    return new object();//Your saved record

                object newRecord = new object();//Save your Object

                cache.AddOrUpdate(value.Name, newRecord, (s, o) => newRecord);

                return newRecord;
            }
        }
    }

    public class InputModel
    {
        public string Name;
    }
}

Yes, that's a valid point about central service(s) for record creation - it will allow to simplify the producers, reduce their dependency on actual DB server, and mitigate concurrency issues in uniform manner (one way or another). But my problems are not with amount of records to save, but with some entities that may or sometimes may not be already present in db - UNIQUE index guarantees absence of duplication, but it won't allow to save the same entities concurrently. And I want a simple and efficient way to reliably persist object graphs containing such entities. — Eugene Podskal, Dec 31 '16 at 07:54
@EugenePodskal What is the point of having 10 users modifying the same record?What's going to happen?Last one wins?And how are you going to guarantee this if you perform retries? I think you should rethink the whole process. Concurrent exceptions should be the exception not a common case.Try to avoid the problem instead of solving it... Maybe tell us more about the "Business" problem to see if we can find a better design. For example maybe you can have your users just creating new records and the "valid" one is the last one — George Vovos, Dec 31 '16 at 10:53
That's probably my fault that it wasn't clear in my question (fixed). Basically CommonEntity/CommonEntityGroup are immutable once created and will never be modified or removed thereafter. The only contention between producers is that they may try to persist some CommonEntityLeftInfo and CommonEntityRightInfo at the same time that refers to the same CommonEntity that may not be in the database. When producers call SaveChanges one fails because it tries to insert a duplicate entry. And I don't want it to fail even in this quite rare case. — Eugene Podskal, Dec 31 '16 at 11:05
@EugenePodskal Then ,1)Ignore my updated answer ,2)If you just want to avoid duplicates,the problem seems simpler. When you try to add duplicate data you get a specific exception right? (ConstraintViolation?) Can you not just ignore it? — George Vovos, Dec 31 '16 at 11:15
Yes, I have listed some possible solutions in the question, but perhaps there is a better way (for example MERGE statement in SP proposed by Evk would be a better choice from a performance point of view(though performance is not an issue in our system for now)). — Eugene Podskal, Dec 31 '16 at 11:22
Producers are actually separate scheduled applications/services, though they use(will use) the same Web API endpoint(same web app) to post the data. — Eugene Podskal, Dec 31 '16 at 11:28
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/131963/discussion-between-george-vovos-and-eugene-podskal). — George Vovos, Dec 31 '16 at 11:29

score 2 · Answer 4 · answered Dec 31 '16 at 15:35

Producers do not know/care about IDs of those CommonEntities - they usually just pass DTOs with Names(unique) of those CommonEntities and related information. So any Common(Group)Entity has to be found/created by Name.

I assume that tables that store your objects reference CommonEntity by their ID, not Name.

I assume that the object's table definition looks something like this:

CREATE TABLE SomeObject(
    Id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
    ObjectName NVARCHAR(100) NOT NULL,
    CommonEntityId INT NOT NULL,
    CONSTRAINT FK_SomeObject_CommonEntity FOREIGN KEY(CommonEntityId) 
        REFERENCES CommonEntity(Id)
);

At the same time, the high-level SaveSomeObject function has CommonEntity.Name and CommonEntityGroup.Name (not ID) as parameters. It means that somewhere the function has to look up the entity's Name and find its corresponding ID.

So, the high-level SaveSomeObject function with parameters (ObjectName, CommonEntityName, CommonEntityGroupName) can be implemented as two steps:

CommonEntityID = GetCommonEntityID(CommonEntityName, CommonEntityGroupName);
SaveSomeObject(ObjectName, CommonEntityID);

GetCommonEntityID is a helper function/stored procedure that looks up entity's ID by its Name and creates an entity (generates an ID) if needed.

Here we explicitly extract this step into a separate dedicated function. Only this function has to deal with concurrency issues. It can be implemented using optimistic concurrency approach or pessimistic. The user of this function doesn't care what magic it uses to return valid ID, but the user can be sure that he can safely use returned ID for persisting the rest of the object.

Pessimistic concurrency approach

Pessimistic concurrency approach is simple. Make sure that only one instance of GetCommonEntityID can be run. I'd use sp_getapplock for it (instead of SERIALIZABLE transaction isolation level or table hints). sp_getapplock is essentially a mutex and once a lock is obtained we can be sure that no other instance of this stored procedure is running in parallel. This makes the logic simple - try to read the ID and INSERT the new row if it is not found.

CREATE PROCEDURE [dbo].[GetCommonEntityID]
    @ParamCommonEntityName NVARCHAR(100),
    @ParamCommonEntityGroupName NVARCHAR(100),
    @ParamCommonEntityID int OUTPUT
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    BEGIN TRANSACTION;
    BEGIN TRY

        SET @ParamCommonEntityID = NULL;
        DECLARE @VarCommonEntityGroupID int = NULL;

        DECLARE @VarLockResult int;
        EXEC @VarLockResult = sp_getapplock
            @Resource = 'GetCommonEntityID_app_lock',
            @LockMode = 'Exclusive',
            @LockOwner = 'Transaction',
            @LockTimeout = 60000,
            @DbPrincipal = 'public';

        IF @VarLockResult >= 0
        BEGIN
            -- Acquired the lock

            SELECT @VarCommonEntityGroupID = ID
            FROM CommonEntityGroup
            WHERE Name = @ParamCommonEntityGroupName;

            IF @VarCommonEntityGroupID IS NULL
            BEGIN
                -- Such name doesn't exist, create it.
                INSERT INTO CommonEntityGroup (Name)
                VALUES (@ParamCommonEntityGroupName);

                SET @VarCommonEntityGroupID = SCOPE_IDENTITY();
            END;

            SELECT @ParamCommonEntityID = ID
            FROM CommonEntity
            WHERE
                Name = @ParamCommonEntityName
                AND CommonEntityGroupId = @VarCommonEntityGroupID
            ;

            IF @ParamCommonEntityID IS NULL
            BEGIN
                -- Such name doesn't exist, create it.
                INSERT INTO CommonEntity
                    (Name
                    ,CommonEntityGroupId)
                VALUES
                    (@ParamCommonEntityName
                    ,@VarCommonEntityGroupID);

                SET @ParamCommonEntityID = SCOPE_IDENTITY();
            END;

        END ELSE BEGIN
            -- TODO: process the error. Retry
        END;

        COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
        ROLLBACK TRANSACTION;
            -- TODO: process the error. Retry?
    END CATCH;

END

Optimistic concurrency approach

Do not try to lock anything. Act optimistically and look up the ID. If not found, try to INSERT the new value and retry if there is a unique index violation.

CREATE PROCEDURE [dbo].[GetCommonEntityID]
    @ParamCommonEntityName NVARCHAR(100),
    @ParamCommonEntityGroupName NVARCHAR(100),
    @ParamCommonEntityID int OUTPUT
AS
BEGIN
    SET NOCOUNT ON;
    SET XACT_ABORT ON;

    SET @ParamCommonEntityID = NULL;
    DECLARE @VarCommonEntityGroupID int = NULL;

    SELECT @VarCommonEntityGroupID = ID
    FROM CommonEntityGroup
    WHERE Name = @ParamCommonEntityGroupName;

    WHILE @VarCommonEntityGroupID IS NULL
    BEGIN
        -- Such name doesn't exist, create it.
        BEGIN TRANSACTION;
        BEGIN TRY

            INSERT INTO CommonEntityGroup (Name)
            VALUES (@ParamCommonEntityGroupName);

            SET @VarCommonEntityGroupID = SCOPE_IDENTITY();

            COMMIT TRANSACTION;
        END TRY
        BEGIN CATCH
            ROLLBACK TRANSACTION;
            -- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
            -- error is indeed due to unique index violation and retry
        END CATCH;

        SELECT @VarCommonEntityGroupID = ID
        FROM CommonEntityGroup
        WHERE Name = @ParamCommonEntityGroupName;

    END;


    SELECT @ParamCommonEntityID = ID
    FROM CommonEntity
    WHERE
        Name = @ParamCommonEntityName
        AND CommonEntityGroupId = @VarCommonEntityGroupID
    ;

    WHILE @ParamCommonEntityID IS NULL
    BEGIN
        -- Such name doesn't exist, create it.
        BEGIN TRANSACTION;
        BEGIN TRY

            INSERT INTO CommonEntity
                (Name
                ,CommonEntityGroupId)
            VALUES
                (@ParamCommonEntityName
                ,@VarCommonEntityGroupID);

            SET @ParamCommonEntityID = SCOPE_IDENTITY();

            COMMIT TRANSACTION;
        END TRY
        BEGIN CATCH
            ROLLBACK TRANSACTION;
            -- TODO: Use ERROR_NUMBER() and ERROR_STATE() to check that
            -- error is indeed due to unique index violation and retry
        END CATCH;

        SELECT @ParamCommonEntityID = ID
        FROM CommonEntity
        WHERE
            Name = @ParamCommonEntityName
            AND CommonEntityGroupId = @VarCommonEntityGroupID
        ;

    END;

END

In both approaches you should have retry logic. Optimistic approach is generally better when you expect to have the Names already in the entity table and likelihood of retries is low (as in your case described in the question). Pessimistic approach is generally better when you expect to have a lot of competing processes that will try to insert the same Name. You are likely to be better off if you serialise inserts.

Thanks for `sp_getapplock` - it is an interesting approach to consider. And it would probably be more readable to have separate GetCommonEntityId , GetCommonEntityGroupId SPs and call GetCommonEntityGroupId from GetCommonEntityId. — Eugene Podskal, Dec 31 '16 at 15:57
I like `sp_getapplock`, because it doesn't lock the table (as table hints do), so if other parts of the system use other unrelated parts of the table, they are not affected. And the logic of the mutex is easy to understand (for me personally). — Vladimir Baranov, Dec 31 '16 at 16:20
I'm not sure why you think table hints lock the table. `with (updlock, holdlock` use key range locks, not a table lock, when there is an index. [Key-Range Locking](https://technet.microsoft.com/en-us/library/ms191272(v=sql.105).aspx) — SqlZim, Dec 31 '16 at 17:37
@SqlZim, yes, engine generally tries to lock only rows that it needs, but it can escalate the lock to the page and table level and this escalation is quite unpredictable. Also, this behaviour may change as you create or drop indexes. Yes, locking hints allow to achieve correct behaviour, but I personally prefer `sp_getapplock`, because there are less hidden "gotchas" to look for. — Vladimir Baranov, Dec 31 '16 at 23:44
@BogdanSahlean, you are right, the first approach serialises calls to `GetCommonEntityID`, that's why I called it pessimistic. The second approach is optimistic, because it tries to do what is needed without extra checks in advance and there is retry logic if the action fails. I believe it is thread safe because of this retry logic. — Vladimir Baranov, Dec 31 '16 at 23:49

What is an efficient way to handle inserts of unique "immutable" entities by multiple producers with optimistic concurrency approach?

4 Answers4