2

I'm looking for a C# pattern for coding a synchonized operation including writes to two different databases for a particular entity such that I can avoid race conditions for simultaneous operations on the same entity.

E.g. Thread 1 and thread 2 are processing an operation on entity X at the same time. The operation writes information for X to database A (in my case, an upsert to MongoDB) and database B (an insert to SqlServer). Thread 3 is processing the same operation on entity Y. The desired behavior is:

  • Thread 1 blocks thread 2 while processing writes to A and B for entity X.
  • Thread 2 waits until thread 1 completes writes to A and B and then makes writes to A and B for entity X.
  • Thread 3 is not blocked and processes writes to A and B for entity Y while thread 1 is processing

The behavior I'm trying to avoid is:

  • Thread 1 writes to A for entity X.
  • Thread 2 writes to A for entity X.
  • Thread 2 writes to B for entity X.
  • Thread 1 writes to B for entity X.

I could uses a mutex across all threads, but I don't really want to block the operation for a different entity.

  • What databases? Why do writes of different entities need to be synchronized across threads? How are reads considered? Is thus purely a problem _within_ a single process? – user2864740 May 06 '20 at 18:14
  • I updated the question to include the database details. Writes for different entities do not need to be synchronized. This is a process running on multiple servers and each process creates multiple threads via Hangfire jobs. – Jim Sweeney May 06 '20 at 20:07

2 Answers2

1

Using the lock statement is insufficient for multiple processes1. Even named/system semaphores are limited to a single-machine and thus insufficient from multiple servers.

If duplicate processing is OK and a "winner" can be selected, it may be sufficient just to write/update-over or use a flavor of optimistic concurrency. If stronger process-once concurrently guarantees need to be maintained, a global locking mechanism needs to be employed - SQL Server supports such a mechanism via sp_getapplock.

Likewise, the model can be updated so that each agent 'requests' the next unit of work such that dispatch can be centrally controlled and that an entity, based on ID etc., is only given to a single agent at a time for processing. Another option might be to use a Messaging system like RabbitMQ (or Kafka etc., fsvo); for RabbitMQ, one might even use Consistent Hashing to ensure (for the most part) that different consumers receive non-overlapping messages. The details differ based on implementation used.

Due to the different nature of a SQL RDBMS and MongoDB (especially if used as "a cache"), it may be sufficient to loosen the restriction and/or design the problem using MongoDB as a read through (which is a good way to use caches). This can mitigate the paired-write issue, although it does not prevent global concurrent processing of the same items.

1Even though a lock statement is globally insufficient, it can be still be employed locally between threads in a single process to reduce local contention and/or minimize global locking.


The answer below was for the original question, assuming a single process.

The "standard" method of avoiding working on the same object concurrently via multiple threads would be with a lock statement on the specific object. The lock is acquired on the object itself, such that lock(X) and lock(Y) are independent when !ReferenceEquals(X,Y).

The lock statement acquires the mutual-exclusion lock for a given object, executes a statement block, and then releases the lock. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and waits until the lock is released.

lock (objectBeingSaved) {
  // This code execution is mutually-exclusive over a specific object..
  // ..and independent (non-blocking) over different objects.
  Process(objectBeingSaved);
}

A local process lock does not necessarily translate into sufficient guarantees for databases access or when then the access spills across processes. The scope of the lock should also be considered: eg. should it cover all processing, only saving, or some other work unit?

To control what objects are being locked and reduce the chance of undesired/accidental lock interactions, it's sometimes recommend to add a field of the most specific visibility to the objects explicitly (and only for) the purpose of establishing a lock. This can also be used to group objects which should lock on each other, if such is a consideration.

It's also possible to use a locking pool, although such tends to be a more 'advanced' use-case with only specific applicability. Using pools also allows using semaphores (in even more specific use-cases) as well as a simple lock.

If there needs to be a lock per external ID, one approach is to integrate the entities being worked on with a pool, establishing locks across entities:

// Some lock pool. Variations of the strategy:
// - Weak-value hash table
// - Explicit acquire/release lock
// - Explicit acquire/release from ctor and finalizer (or Dispose)
var locks = CreateLockPool();
// When object is created, assign a lock object
var entity = CreateEntity();
// Returns same lock object (instance) for the given ID, and a different
// lock object (instance) for a different ID.
etity.Lock = GetLock(locks, entity.ID);

lock (entity.Lock) {
  // Mutually exclusive per whatever rules are to select the lock
  Process(entity);
}

Another variation is a localized pool, instead of carrying around a lock object per entity itself. It is conceptually the same model as above, just flipped outside-in. Here is a gist. YMMV.

private sealed class Locker { public int Count; }

IDictionary<int, Locker> _locks = new Dictionary<int, Locker>();

void WithLockOnId(int id, Action action) {
  Locker locker;
  lock (_locks) {
     // The _locks might have lots of contention; the work
     // done inside is expected to be FAST in comparison to action().
     if (!_locks.TryGetValue(id, out locker)
        locker = _locks[id] = new Locker();
     ++locker.Count;
  }
  lock (locker) {
     // Runs mutually-exclusive by ID, as established per creation of
     // distinct lock objects.
     action();
  }
  lock (_locks) {
     // Don't forget to take out the garbage..
     // This would be better with try/finally, which is left as an exercise
     // to the reader, along with fixing any other minor errors.
     if (--_locks[id].Count == 0)
       _locks.Remove(id);
  }
}

// And then..
WithLockOnId(x.ID, () => Process(x));

Taking a sideways step, another approach is to 'shard' entities across thread/processing units. Thus each thread is guaranteed to never be processing the same entity as another thread: X,Y,Z always go to #1 and P,D,Q always to #2. (It's a little bit more complicated to optimize throughput..)

var threadIndex = entity.ID % NumThreads;
QueueWorkOnThread(threadIndex, entity); // eg. add to List<ConcurrentQueue>
user2864740
  • 60,010
  • 15
  • 145
  • 220
  • Thanks for all the detail! A couple of things I should have mentioned - the two databases mentioned in my question are different engines, MongoDB and SqlServer. Also, the lock should not be on a particular object, but on a single entity represented by a different object in each thread. I updated my question with the database details. – Jim Sweeney May 06 '20 at 20:23
  • 1
    If the process is running on multiple servers, can't "just" use `lock` (or even a semaphore; named semaphores are still per-system), although it can reduce local contention. However, the outside construct needs to be accounted for: ie. what happens if multiple servers write to the DB at the same time? Who wins? Does it matter / what happens on a conflict? Both of those databases _do_ support concurrent connections (as in, the connection itself won't corrupt) as long as a _different_ connection is used per thread. This still doesn't guarantee larger atomicity or mutually-exclusive processing. – user2864740 May 06 '20 at 20:33
  • Can you elaborate on "using MongoDB as a read through"? I have to write some data to MongoDB and some data to SqlServer, and it's ok for the last thread to "win" such that it writes last to both databases, but I need to avoid having a different winner for each database. Are you saying I can eliminate that possibility even with threads writing from two different servers? – Jim Sweeney May 06 '20 at 22:07
  • 1
    With a read-through, whatever process reads from MongoDB is _also_ responsible to providing a value if such does not exist: the core data is read-up (if needed) and cached in a different abstraction. Then the system of record becomes entirely the RDBMS with MongoDB providing a cache layer. (Such an approach might not always be possible, although I strongly recommend a single source of truth that can propagate outwards.) – user2864740 May 06 '20 at 22:57
1

I would suggest using simple lock (if it is in one area of the code) As it would be processing different objects (meaning .net objects) but having the same value (as it is the same entity) I would rather go with some form of code for entities. If the entity has some form of code I would use it - for example:

But of course, you have to watch out for deadlocks. And String.Intern is tricky, as it Interns the string for as long as application runs.

lock(String.Intern(myEntity.Code))
{
   SaveToDatabaseA(myEntity);
   SaveToDatabaseB(myEntity);
}

But it looks like you want to have some kind of replication mechanism. Then I would rather do it on database level (not on code level)

[UPDATE]

You updated the question with information, that it is being done on multiple servers. And this information is kind a crucial here :) Normal lock wont work.

Of course, you can play with synchronizing the locks across different servers, but is like with distributed transactions. Theoretically speaking you can do it, but most of the persons just avoid it as long as they can, and they play with the architecture of the solution to simplify the process.

[UPDATE 2]

You may also find this interesting: Distributed locking in .NET

:)

Piotr
  • 1,155
  • 12
  • 29
  • So this basically becomes a named lock that will only block other threads that are processing the same entity. Can you explain how this would be subject to deadlocks? The two writes aren't really about replication as one database is MongoDB and the other is SqlServer (see my edits to the question to include these details) – Jim Sweeney May 06 '20 at 20:13
  • 1
    _Avoid_ using `string.Intern` here: instances are _never_ released from the String Pool. Using/abusing `string.Intern` like this is a way to make a long-running process over an unbound set of data become slower over time (`string.Intern` is much slower to update than a Dictionary as the pool size increases) and eventually crash when it runs out of memory. – user2864740 May 06 '20 at 20:37
  • 1
    It could if you would use String.Intern on other place in code with the same code. That is the risk. So maybe even better would be to add some "place in code" identifier, so that it would secure you from this risk. So then it should be like: `String.Intern($"WritingEntities{myEntity.Code)")` – Piotr May 06 '20 at 20:38
  • @user2864740 i've stated that in the anwser, that it String.Intern string stays in memory. If your entities set is like 1000 or 10 000 - don't bother any issues. But if you would use this code on 100 000 or 1 000 000 - it can become a problem. – Piotr May 06 '20 at 20:40
  • Having read a bit about locks and thinking about it a bit more, I realize another wrinkle is that this process is running on multiple servers. The process on one server would not know about locks held on another server. I guess the lock would have to be on the database, but since this operation involves both MongoDB and SqlServer, I'm not sure how I would go about that. – Jim Sweeney May 06 '20 at 20:40
  • The comment above makes it *explicitly clear*, with reasoning - that's all. I'm probably not the only person who works on long-running systems that process many millions upon millions of items daily in many dozens of services - it's good to get code off on a good foot, because cleanup and maintenance becomes a real burden :-/ *shrug* – user2864740 May 06 '20 at 20:41
  • The application would likely have to Intern more than 100,000 ids, and that will continue to grow over time, so maybe that's not a scalable solution. – Jim Sweeney May 06 '20 at 20:46
  • 1
    @JimSweeney - the fact that the app is running on multiple servers is like decisive factor here. Of course, that in such scenario you cannot rely on lock :D That is why, you have to describe your issue IN DETAILS, otherwise, someone will suggest you solution that don't fit your case;) – Piotr May 06 '20 at 20:49