38

I've been reading other questions on how to implement if-exists-insert-else-update semantics in EF, but either I'm not understanding how the answers work, or they are in fact not addressing the issue. A common solution offered is to wrap the work in a transaction scope (eg: Implementing if-not-exists-insert using Entity Framework without race conditions):

using (var scope = new TransactionScope()) // default isolation level is serializable
using(var context = new MyEntities())
{
    var user = context.Users.SingleOrDefault(u => u.Id == userId); // *
    if (user != null)
    {
        // update the user
        user.property = newProperty;
        context.SaveChanges();
    }
    else
    {
        user = new User
        {
             // etc
        };
        context.Users.AddObject(user);
        context.SaveChanges();
    }
}

But I fail to see how this solves anything, as for this to work, the line I have starred above should block if a second thread tries to access the same user ID, unblocking only when the first thread has finished its work. Using a transaction will not cause this however, and we'll get an UpdateException thrown due to the key violation that occurs when the second thread attempts to create the same user for a second time.

Instead of catching the exception caused by the race condition, it would be better to prevent the race condition from happening in the first place. One way to do this would be for the starred line to take out an exclusive lock on the database row that matches its condition, meaning that in the context of this block, only one thread at a time could work with a user.

It seems that this must be a common problem for users of the EF, so I'm looking for a clean, generic solution that I can use everywhere.

I'd really like to avoid using a stored procedure to create my user if possible.

Any ideas?

EDIT: I tried executing the above code concurrently on two different threads using the same user ID, and despite taking out serializable transactions, they were both able to enter the critical section (*) concurrently. This lead to an UpdateException being thrown when the second thread attempted to insert the same user ID that the first had just inserted. This is because, as pointed out by Ladislav below, a serializable transaction takes exclusive locks only after it has begun modifying data, not reading.

Community
  • 1
  • 1
Mike Chamberlain
  • 39,692
  • 27
  • 110
  • 158

4 Answers4

21

When using serializable transaction SQL Server issues shared locks on read records / tables. Shared locks doesn't allow other transactions modifying locked data (transactions will block) but it allows other transactions reading data before the transaction which issued locks start modifying data. That is the reason why the example doesn't work - concurrent reads are allowed with shared locks until the first transaction starts modifying data.

You want isolation where select command locks the whole table exclusively for a single client. It must lock the whole table because otherwise it will not solve concurrency for inserting "the same" record. Granular control for locking records or tables by select commands is possible when using hints but you must write direct SQL queries to use them - EF has no support for that. I described approach for exclusively locking that table here but it is like creating sequential access to the table and it affects all other clients accessing this table.

If you are really sure that this operation happens just in your single method and there are not other applications using your database you can simply place the code into critical section (.NET synchronization for example with lock) and ensure on the .NET side that only single thread can access critical section. That is not so reliable solution but any playing with locks and transaction levels has a big impact on the database performance and throughput. You can combine this approach with optimistic concurrency (unique constraints, timestamps, etc).

Community
  • 1
  • 1
Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
  • You mention that we must lock the whole table, but wouldn't a range lock be more efficient and also solve the problem? – Mike Chamberlain May 31 '11 at 13:43
  • I'm not sure what exactly is locked with range lock if you are searching for record which doesn't exist. – Ladislav Mrnka May 31 '11 at 13:47
  • I think this article has the answer: http://blogs.msdn.com/b/sqlserverstorageengine/archive/2006/05/26/range-locks.aspx. "Equality predicate... If the key does not exist, then the ‘range’ lock is taken on the ‘next’ key both for unique and non-unique index. If the ‘next’ key does not exist, then a range lock is taken on the ‘infinity’ value" – Mike Chamberlain May 31 '11 at 13:56
  • So how can I do this from EF? Shouldn't the serializable transaction already be taking out an exclusive range lock? If so, how come two threads in different object contexts are able to enter the critical section concurrently? – Mike Chamberlain May 31 '11 at 14:04
  • 1
    No serializable transaction will not do that: http://msdn.microsoft.com/en-us/library/ms173763.aspx Until the first transaction starts modifying data locks are still shared so other transaction can read them as well. EF will not allow you to add necessary hints to SQL query. If you want to do that on the database level you must use custom SQL query or stored procedure. – Ladislav Mrnka May 31 '11 at 14:20
5

Just to add my way, not that it really deals with the annoyance of exceptions being thrown and transactions not quite cutting it as a scalable solution but it does avoid race conditions from causing problems where lock type solutions are not possible (easily managed) such as in distributed systems.

I very simply use the exception and try the insert first. I use a modification of your original code as an example:

using(var context = new MyEntities())
{
    EntityEntry entityUser = null;
    try 
    {
        user = new User
        {
             // etc
        };
        entityUser = context.Users.Add(user);
        context.SaveChanges(); // Will throw if the entity already exists
    } 
    catch (DbUpdateException x)
    when (x.InnerException != null && x.InnerException.Message.StartsWith("Cannot insert duplicate key row in object"))
    {
        if (entityUser != null)
        {
            // Detach the entity to stop it hanging around on the context
            entityUser.State = EntityState.Detached;
        }
        var user = context.Users.Find(userId);
        if (user != null) // just in case someone deleted it in the mean time
        {
            // update the user
            user.property = newProperty;
            context.SaveChanges();
        }
    }
}

It's not pretty, but it works and might be of use to someone.

Baguazhang
  • 309
  • 2
  • 3
1

Maybe I am missing something, but when I simulate the example above in the SQL Management Studio, this is working as expected.

Both Serializable transactions check if the userId exists, and acquire range locks on the specified selection.

Assuming this userId does not exist, both transactions try to insert a new record with the userId - which is not possible. Because of their Serializable isolation level, both transactions cannot insert a new record into the users table because this would introduce phantom reads for the other transaction.

So this situation results in a deadlock because of the range locks. You'll end up with an deadlock and one transaction shall be victimized, the other one will succeed.

Is Entity Framework handling this any different? I suspect you'd end up with an UpdateException with a nested SqlException identifying the deadlock.

  • I do end up with an update exception, but the desired behavior is to avoid this by only allowing one thread to enter the critical section at a time, marked with a // * above. This would avoid the problem entirely. – Mike Chamberlain Jun 08 '11 at 00:12
  • I see. Well in that case, to lock the table explicitly via EF you can use ObjectContext.ExecuteStoreCommand() to apply a suitable lock on the table. – Jeroen de Bekker Jun 09 '11 at 09:12
0

You can change transaction isolation level using TransactionOptions for TransactionScope to more stricter (I guess, for your case it is RepeatableRead or Serializable), but remember that any locks decrease scalability.

Does it really matter to provide such level of concurrency control? Will your application be used in same cases in production environment? Here is good post by Udi Dahan about race conditions.

xelibrion
  • 2,220
  • 1
  • 19
  • 24
  • The default isolation level for a TransactionScope is already the most restrictive (Serializable). But this is exactly my question - how does this stop two threads entering the critical section starting with the starred line above? Two threads are free to SELECT the same data, no matter how restrictive their concurrency options are, as long as the data is not locked for reading. My understanding is that SELECTing within a transaction causes the data to only be write-locked. Is this not correct? – Mike Chamberlain May 30 '11 at 05:49
  • As for your question as to whether this is necessary, it is true that we probably COULD get away without it in the specific case I'm currently working on, but I would like to find a robust, simple, generic solution for use in the general case when such concurrency control might be critical. – Mike Chamberlain May 30 '11 at 05:51
  • Default isolation level for transaction scope is ReadCommited. And SELECTing data under ReadCommited isolation level indeed does not cause any reader locks. Try to specify more restrictive transaction level manually and you will see the difference http://en.wikipedia.org/wiki/Isolation_(database_systems) – xelibrion May 30 '11 at 06:02
  • 1
    Two things here 1) the default isolation level for a System.Transactions.TransactionScope is indeed serializable (http://msdn.microsoft.com/en-us/library/ms131084.aspx). 2) I tried executing the above code concurrently on two different threads using the same user ID, and despite taking out serializable transactions, they were both able to enter the critical section (*) concurrently. This lead to an UpdateException being thrown when the second thread attempted to insert the same user ID that the first had just inserted. – Mike Chamberlain May 30 '11 at 06:36
  • 1
    I guess it happens because of 1st level cache in EF - only first query hits database, 2nd query gets data from context using identity map. That's why isolation level does not work in this case. Try to use C# lock statement, but remember that using object context in multi-threaded environment is not safe http://efprof.com/Learn/Alerts/CrossThreadSessionUsage – xelibrion May 30 '11 at 07:32
  • I am calling this same code from separate threads, so creating two different object contexts for what you say to be true, objects must therefore be shared between object contexts. Is this correct? – Mike Chamberlain May 31 '11 at 14:16
  • No, each context has own cache and own copies of loaded objects. – xelibrion May 31 '11 at 15:12