How to implement a writer-preferring read/write lock for *nix processes

Question

There is a Unix function called flock() that processes can use to obtain either shared ("read") access or exclusive ("write") access to a resource. The problem is that it starves those processes that are requesting exclusive access. Such a request remains queued until such time that there are no processes holding a shared lock; meanwhile, new requests for a shared lock are granted them "ahead of" the process that is waiting for an exclusive lock.

Clearly, the more processes there are requesting shared locks, the longer a writer will have to wait for that fortuitous window of time in which there are no outstanding shared locks being held.

The behavior I seek is this: once a writer has requested an exclusive lock, subsequent readers who request a shared lock will be queued behind the writer. The name for this type of lock, I'm told, is "writer-preferring read/write lock".

There are several posts (this one in particular) that address this question, but at the thread level. What I need is a Unix/Linux-oriented solution for coordinating processes in this manner.

UPDATE: I need for the solution to handle the possibility that a participating process may crash while it holds a lock, by automatically removing the lock.

Yuriy Yevtukhov · Answer 1 · 2016-09-01T21:20:11.297

I'm little late, but only now got similar task and found possible simple solution, which is robust (i.e. if owner of lock is crashed, then lock is freed by system automatically): to use double flock() call, on different targets. Assume that two arbitraty lock-files are already opened into descriptors fd_sh and fd_ex. Then to gain shared access:

flock(fd_ex, LOCK_SH) - allow multiple readers to pass through this lock but block writers
flock(fd_sh, LOCK_SH) - used to block activated writer while readers are working
flock(fd_ex, LOCK_UN) - minimize time when readers hold this lock
DO USEFUL WORK
flock(fd_sh, LOCK_UN)

To gain exclusive access:

flock(fd_ex, LOCK_EX) - only one process can go through this lock
flock(fd_sh, LOCK_EX) - effectively wait for all readers to finish
flock(fd_sh, LOCK_UN) - readers finished, lock is unnecessary (can be done after the work, doesn't matter)
DO USEFUL WORK
flock(fd_ex, LOCK_UN)

Such method gives writers much more chances to get the lock because readers hold fd_ex very small time necessary just to lock fd_sh in shared mode which in turn is very quick in the absence of working writer. So first writer will go through step 1 in rather small time and on step 2 will wait for only that readers which already have the lock. While one writer is working, all readers and other writers are in equal condition and only kernel decides, which one will take the next lock but again, next writer does not need to wait when all readers (which kernel put in front of it in queue of waiters) will finish their job, only small time to pass steps 1-3 which all readers will pass simultaneously (if enough cores, of course).

If someone crashes holding the lock, lock will be silently released. Special care must be made to detect such situation in other workers. For example, if only crash of writer has meaning, some mark can be put to file under fd_ex descriptor at the start of some critical work in writer and cleared before doing unlock. Readers can check that mark and skip work or initiate recheck. Next writer can finish the job.

Some synthetic tests were made for three realizations: single flock, proposed double flock and pthread_rwlock (with PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP attr). I used 8 processes (threads for pthread_rwlock), three variants of read/write ratios, two testing platforms.

Results for testing platform 1 - Dual core Intel core2 on Debian Linux with kernel 3.16.36:

                                 90% reads/10% writes
                 single flock        double flock         pthread_rwlock
total overhead*     90.4%                21.7%                 11.6%
readers waittime*   20.2%                50.1%                 47.1%
writers waittime    95.2%                64.5%                 54.8%

                                 50% reads/50% writes
                 single flock        double flock         pthread_rwlock
total overhead      22.0%                33.7%                  3.2%
readers waittime    63.6%                82.2%                 82.7%
writers waittime    87.8%                84.0%                 70.3%

                                 10% reads/90% writes
                 single flock        double flock         pthread_rwlock
total overhead       5.3%                 8.4%                   0.2%
readers waittime    82.5%                87.2%                  96.8%
writers waittime    87.3%                87.4%                  78.5%

'total overhead' is ratio ('actual processing time' - 'ideal time') / 'ideal time', ideal time is when all useful shared work is done simultaneously and evenly by all workers and THEN all useful exclusive work is done by single worker or sequentially by several workers;

'waittime' is ratio 'time waiting for lock' / ('time waiting for lock' + 'useful work'), it is not very informative since ideal value is not zero and depends on read-write ratio and number of workers

Results for testing platform 2 - 16 real cores (32 with HT, Intel Xeon) on Debian Linux with kernel 3.19.6:

                                 90% reads/10% writes
                 single flock        double flock         pthread_rwlock
total overhead      134.3%               17.1%                 11.6%
readers waittime    13.2%                46.4%                 45.8%
writers waittime    96.7%                65.3%                 54.3%

                                 50% reads/50% writes
                 single flock        double flock         pthread_rwlock
total overhead      37.9%                30.5%                  2.9%
readers waittime    46.1%                78.4%                 83.1%
writers waittime    90.5%                85.9%                 70.0%

                                 10% reads/90% writes
                 single flock        double flock         pthread_rwlock
total overhead       7.2%                 9.0%                   0.4%
readers waittime    66.9%                80.6%                  96.8%
writers waittime    88.0%                87.9%                  78.4%

As you can see, proposed double flock method allows to dramatically decrease overhead with low writes ratio comparing to single flock. With high writes ratio overhead is rather low in all cases. For equal case result depends on testing platform. Contention is much higher when number of CPUs is enough for all workers. Pthread's rwlock shows very good results in all cases. So use it if you can but remember that lock won't be released in case of rough death of worker.

Little more about test method. Useful work for readers and writers was done by "usleep(10000+rand()%1000)" calls. Real time of waiting and useful work were calculated using clock_gettime(CLOCK_MONOTONIC). There was additional usleep(1) call after each iteration (after lock was released) to bring contention closer to real life applications which wait for new request to arrive. Without this call results of both flock methods fall dramatically on multi-core platform.

score 2 · Answer 2 · edited May 23 '17 at 11:55

You can use the method described in the referenced question for inter-process as well as inter-thread synchronization. You have to make sure the pthread_rwlock_t object is in memory shared between the processes you want to synchronize, and use the pthread_rwlockattr_setpshared() function to mark the pthread_rwlockattr_t object you use to initialize the pthread_rwlock_t as PTHREAD_PROCESS_SHARED.

If you need your synchronization to reset itself when the process exits, you'll need to use a reader-writer lock based on a different synchronization primitive. I think System V semaphores (otherwise known as XIS IPC semaphores) should do the trick, since they are supposed to reset themselves when a process that has manipulated them exits without resetting them. System V semaphores are an optional feature of the Linux kernel.

You might want to use a library that has higher level locking abstractions for this-- if you are using C++, you can use boost synchronization. However, I'm not sure that boost synchronization supports locks backed by System V IPC semaphores-- you might be forced to roll your own reader-writer lock on top of a Sys V semaphore set. Specifically, you'll need a read semaphore, a write semaphore and a write queue semaphore. Writers increment the write queue and wait for the read and the write semaphore to go to 0, then increment the write semaphore. Readers wait for the write queue and write semaphore to go to 0, then increment the read semaphore.

I think - and please correct me if I'm wrong - that the OS must somehow be involved in the lock management for coordinated processes; otherwise, if one crashes holding a lock, the lock might never be cleaned up. (I was unsure whether interthread locking would care about this.) Boost is indeed an option. Having looked at the link you provided, the most likely candidate appears to be the Named Utilities. — Chap, Dec 23 '14 at 19:39
@Chap -- All the state for an inter-process lock that is in shared memory is in the shared memory, so it doesn't need to be cleaned up separately. The OS does know about the shared memory, of course. If a process crashes holding a lock, the lock will stay held indefinitely. — antlersoft, Dec 23 '14 at 20:15
I do need for the lock to be released if the process crashes. I've amended my original question to state this explicitly. Still learning about boost named synchronization primitives... — Chap, Dec 23 '14 at 21:47

How to implement a writer-preferring read/write lock for *nix processes

2 Answers2