I'm little late, but only now got similar task and found possible simple solution, which is robust (i.e. if owner of lock is crashed, then lock is freed by system automatically): to use double flock() call, on different targets. Assume that two arbitraty lock-files are already opened into descriptors fd_sh and fd_ex. Then to gain shared access:
- flock(fd_ex, LOCK_SH) - allow multiple readers to pass through this lock but block writers
- flock(fd_sh, LOCK_SH) - used to block activated writer while readers are working
- flock(fd_ex, LOCK_UN) - minimize time when readers hold this lock
- DO USEFUL WORK
- flock(fd_sh, LOCK_UN)
To gain exclusive access:
- flock(fd_ex, LOCK_EX) - only one process can go through this lock
- flock(fd_sh, LOCK_EX) - effectively wait for all readers to finish
- flock(fd_sh, LOCK_UN) - readers finished, lock is unnecessary (can be done after the work, doesn't matter)
- DO USEFUL WORK
- flock(fd_ex, LOCK_UN)
Such method gives writers much more chances to get the lock because readers hold fd_ex very small time necessary just to lock fd_sh in shared mode which in turn is very quick in the absence of working writer. So first writer will go through step 1 in rather small time and on step 2 will wait for only that readers which already have the lock. While one writer is working, all readers and other writers are in equal condition and only kernel decides, which one will take the next lock but again, next writer does not need to wait when all readers (which kernel put in front of it in queue of waiters) will finish their job, only small time to pass steps 1-3 which all readers will pass simultaneously (if enough cores, of course).
If someone crashes holding the lock, lock will be silently released. Special care must be made to detect such situation in other workers. For example, if only crash of writer has meaning, some mark can be put to file under fd_ex descriptor at the start of some critical work in writer and cleared before doing unlock. Readers can check that mark and skip work or initiate recheck. Next writer can finish the job.
Some synthetic tests were made for three realizations: single flock, proposed double flock and pthread_rwlock (with PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP attr).
I used 8 processes (threads for pthread_rwlock), three variants of read/write ratios, two testing platforms.
Results for testing platform 1 - Dual core Intel core2 on Debian Linux with kernel 3.16.36:
90% reads/10% writes
single flock double flock pthread_rwlock
total overhead* 90.4% 21.7% 11.6%
readers waittime* 20.2% 50.1% 47.1%
writers waittime 95.2% 64.5% 54.8%
50% reads/50% writes
single flock double flock pthread_rwlock
total overhead 22.0% 33.7% 3.2%
readers waittime 63.6% 82.2% 82.7%
writers waittime 87.8% 84.0% 70.3%
10% reads/90% writes
single flock double flock pthread_rwlock
total overhead 5.3% 8.4% 0.2%
readers waittime 82.5% 87.2% 96.8%
writers waittime 87.3% 87.4% 78.5%
'total overhead' is ratio ('actual processing time' - 'ideal time') / 'ideal time', ideal time is when all useful shared work is done simultaneously and evenly by all workers and THEN all useful exclusive work is done by single worker or sequentially by several workers;
'waittime' is ratio 'time waiting for lock' / ('time waiting for lock' + 'useful work'), it is not very informative since ideal value is not zero and depends on read-write ratio and number of workers
Results for testing platform 2 - 16 real cores (32 with HT, Intel Xeon) on Debian Linux with kernel 3.19.6:
90% reads/10% writes
single flock double flock pthread_rwlock
total overhead 134.3% 17.1% 11.6%
readers waittime 13.2% 46.4% 45.8%
writers waittime 96.7% 65.3% 54.3%
50% reads/50% writes
single flock double flock pthread_rwlock
total overhead 37.9% 30.5% 2.9%
readers waittime 46.1% 78.4% 83.1%
writers waittime 90.5% 85.9% 70.0%
10% reads/90% writes
single flock double flock pthread_rwlock
total overhead 7.2% 9.0% 0.4%
readers waittime 66.9% 80.6% 96.8%
writers waittime 88.0% 87.9% 78.4%
As you can see, proposed double flock method allows to dramatically decrease overhead with low writes ratio comparing to single flock. With high writes ratio overhead is rather low in all cases. For equal case result depends on testing platform. Contention is much higher when number of CPUs is enough for all workers.
Pthread's rwlock shows very good results in all cases. So use it if you can but remember that lock won't be released in case of rough death of worker.
Little more about test method. Useful work for readers and writers was done by "usleep(10000+rand()%1000)" calls. Real time of waiting and useful work were calculated using clock_gettime(CLOCK_MONOTONIC). There was additional usleep(1) call after each iteration (after lock was released) to bring contention closer to real life applications which wait for new request to arrive. Without this call results of both flock methods fall dramatically on multi-core platform.