Using the lock
statement is insufficient for multiple processes1. Even named/system semaphores are limited to a single-machine and thus insufficient from multiple servers.
If duplicate processing is OK and a "winner" can be selected, it may be sufficient just to write/update-over or use a flavor of optimistic concurrency. If stronger process-once concurrently guarantees need to be maintained, a global locking mechanism needs to be employed - SQL Server supports such a mechanism via sp_getapplock.
Likewise, the model can be updated so that each agent 'requests' the next unit of work such that dispatch can be centrally controlled and that an entity, based on ID etc., is only given to a single agent at a time for processing. Another option might be to use a Messaging system like RabbitMQ (or Kafka etc., fsvo); for RabbitMQ, one might even use Consistent Hashing to ensure (for the most part) that different consumers receive non-overlapping messages. The details differ based on implementation used.
Due to the different nature of a SQL RDBMS and MongoDB (especially if used as "a cache"), it may be sufficient to loosen the restriction and/or design the problem using MongoDB as a read through (which is a good way to use caches). This can mitigate the paired-write issue, although it does not prevent global concurrent processing of the same items.
1Even though a lock statement is globally insufficient, it can be still be employed locally between threads in a single process to reduce local contention and/or minimize global locking.
The answer below was for the original question, assuming a single process.
The "standard" method of avoiding working on the same object concurrently via multiple threads would be with a lock statement on the specific object. The lock is acquired on the object itself, such that lock(X)
and lock(Y)
are independent when !ReferenceEquals(X,Y)
.
The lock statement acquires the mutual-exclusion lock for a given object, executes a statement block, and then releases the lock. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and waits until the lock is released.
lock (objectBeingSaved) {
// This code execution is mutually-exclusive over a specific object..
// ..and independent (non-blocking) over different objects.
Process(objectBeingSaved);
}
A local process lock does not necessarily translate into sufficient guarantees for databases access or when then the access spills across processes. The scope of the lock should also be considered: eg. should it cover all processing, only saving, or some other work unit?
To control what objects are being locked and reduce the chance of undesired/accidental lock interactions, it's sometimes recommend to add a field of the most specific visibility to the objects explicitly (and only for) the purpose of establishing a lock. This can also be used to group objects which should lock on each other, if such is a consideration.
It's also possible to use a locking pool, although such tends to be a more 'advanced' use-case with only specific applicability. Using pools also allows using semaphores (in even more specific use-cases) as well as a simple lock.
If there needs to be a lock per external ID, one approach is to integrate the entities being worked on with a pool, establishing locks across entities:
// Some lock pool. Variations of the strategy:
// - Weak-value hash table
// - Explicit acquire/release lock
// - Explicit acquire/release from ctor and finalizer (or Dispose)
var locks = CreateLockPool();
// When object is created, assign a lock object
var entity = CreateEntity();
// Returns same lock object (instance) for the given ID, and a different
// lock object (instance) for a different ID.
etity.Lock = GetLock(locks, entity.ID);
lock (entity.Lock) {
// Mutually exclusive per whatever rules are to select the lock
Process(entity);
}
Another variation is a localized pool, instead of carrying around a lock object per entity itself. It is conceptually the same model as above, just flipped outside-in. Here is a gist. YMMV.
private sealed class Locker { public int Count; }
IDictionary<int, Locker> _locks = new Dictionary<int, Locker>();
void WithLockOnId(int id, Action action) {
Locker locker;
lock (_locks) {
// The _locks might have lots of contention; the work
// done inside is expected to be FAST in comparison to action().
if (!_locks.TryGetValue(id, out locker)
locker = _locks[id] = new Locker();
++locker.Count;
}
lock (locker) {
// Runs mutually-exclusive by ID, as established per creation of
// distinct lock objects.
action();
}
lock (_locks) {
// Don't forget to take out the garbage..
// This would be better with try/finally, which is left as an exercise
// to the reader, along with fixing any other minor errors.
if (--_locks[id].Count == 0)
_locks.Remove(id);
}
}
// And then..
WithLockOnId(x.ID, () => Process(x));
Taking a sideways step, another approach is to 'shard' entities across thread/processing units. Thus each thread is guaranteed to never be processing the same entity as another thread: X,Y,Z always go to #1 and P,D,Q always to #2. (It's a little bit more complicated to optimize throughput..)
var threadIndex = entity.ID % NumThreads;
QueueWorkOnThread(threadIndex, entity); // eg. add to List<ConcurrentQueue>