How to do proper file locking on NFS?

Question

I am trying to implement a "record manager" class in python 3x and linux/macOS. The class is relatively easy and straightforward, the only "hard" thing I want is to be able to access the same file (where results are saved) on multiple processes.

This seemed pretty easy, conceptually: when saving, acquire an exclusive lock on the file. Update your information, save the new information, release exclusive lock on the file. Easy enough.

I am using fcntl.lockf(file, fcntl.LOCK_EX) to acquire the exclusive lock. The problem is that, looking on the internet, I am finding a lot of different websites saying how this is not reliable, that it won't work on windows, that the support on NFS is shaky, and that things could change between macOS and linux.

I have accepted that the code won't work on windows, but I was hoping to be able to make it work on macOS (single machine) and on linux (on multiple servers with NFS).

The problem is that I can't seem to make this work; and after a while of debugging and after the tests passed on macOS, they failed once I tried them on the NFS with linux (ubuntu 16.04). The issue is an inconsistency between the informations saved by multiple processes - some processes have their modifications missing, which means something went wrong in the locking and saving procedure.

I am sure there is something I am doing wrong, and I suspect this may be related to the issues that I read about online. So, what is the proper way to deal multiple access to the same file that works on macOS and linux over NFS?

Edit

This is what the typical method that writes new informations to disk looks like:

sf = open(self._save_file_path, 'rb+')
try:
    fcntl.lockf(sf, fcntl.LOCK_EX)  # acquire an exclusive lock - only one writer
    self._raw_update(sf) #updates the records from file (other processes may have modified it)
    self._saved_records[name] = new_info
    self._raw_save() #does not check for locks (but does *not* release the lock on self._save_file_path)
finally:
    sf.flush()
    os.fsync(sf.fileno()) #forcing the OS to write to disk
    sf.close() #release the lock and close

While this is how a typical method that only read info from disk looks like:

sf = open(self._save_file_path, 'rb')
try:
    fcntl.lockf(sf, fcntl.LOCK_SH)  # acquire shared lock - multiple writers
    self._raw_update(sf) #updates the records from file (other processes may have modified it)
    return self._saved_records
finally:
    sf.close() #release the lock and close

Also, this is how _raw_save looks like:

def _raw_save(self):
    #write to temp file first to avoid accidental corruption of information. 
    #os.replace is guaranteed to be an atomic operation in POSIX
    with open('temp_file', 'wb') as p:
        p.write(self._saved_records)
    os.replace('temp_file', self._save_file_path) #pretty sure this does not release the lock

Error message

I have written a unit test where I create 100 different processes, 50 that read and 50 that write to the same file. Each process does some random waiting to avoid accessing the files sequentially.

The problem is that some of the records are not kept; at the end there are some 3-4 random records missing, so I only end up with 46-47 records rather than 50.

Edit 2

I have modified the code above and I acquire the lock not on the file itself, but on a separate lock file. This prevents the issue that closing the file would release the lock (as suggested by @janneb), and makes the code work correctly on mac. The same code fails on linux with NFS though.

You need to include a [Minimal, Complete and Verifable example](https://stackoverflow.com/help/mcve) in the question — Marcin Orlowski, Feb 13 '18 at 15:48
@MarcinOrlowski Hi, since I don't know what exactly is the problem, it is hard to recreate a minimal example. I could dump the whole code in the question, but that's probably not going to help. I have added an example of how the typical methods saves to file though - hope that is enough — Ant, Feb 13 '18 at 15:53
@MarcinOrlowski But in any case I am asking more about "how to do it proper file locking on nfs " than my specific case. Since tests pass on mac and not on linux, I assume the problem must be in the peculiarity of locking on nfs rather than my code. And to deal with these peculiarities I should follow some guidelines / best practices. So my question is, how do people do it? Which guidelines do they follow? — Ant, Feb 13 '18 at 16:01
People got tendency to look for bugs in others' code, not in own, which is not always the case. Since you said you tried something and it does not work, it's worth showing how your current approach looks like to ensure it's not holding the culprit. — Marcin Orlowski, Feb 13 '18 at 16:04
@MarcinOrlowski Sure, I agree :) I only brought it up because of the many issues I read about locking and nfs (but admittedly the articles were quite old). In any way, I am not sure which portion of the code to include. I included the patterns that methods that write / read to disk follow — Ant, Feb 13 '18 at 16:06
I tried `os.replacing` an open file like you did in `_raw_save` under windows and got a (less surprising) `PermissionError`. I'm wondering how it would let you do this under macOS or linux. Either way this looks very much like a recipe for disaster to me. — Darkonaut, Feb 17 '18 at 07:55
If you look at `_raw_save` further...there's nothing preventing two processes opening a file `temp_file` concurrently and the latter will overwrite what the first process wrote. You could make the name of `temp_file` unique by including process ids with `os.getpid()` for example and see if this resolves the issue in your case, but it would probably be cleaner, faster and less error-prone to have just a single writer process and feeding this over a queue. — Darkonaut, Feb 17 '18 at 07:56
@Darkonaut I will check if it's possible to replace an open file, but I don't have any issues on mac. For _raw_save, that is not true; the point of raw_save is not to check locks, with the understanding that it will only be called once an exclusive lock has been acquired. Since we only call _raw_save when we have an exclusive lock, we are sure only one process at a given time it writing the temp. Right? — Ant, Feb 17 '18 at 09:31
Doesn't look like you're placing an effective lock here given you're results. In the docs I read "On at least some systems, LOCK_EX can only be used if the file descriptor refers to a file opened for writing.". I'm wondering if this possibly means it must be "wb" mode (write only) and it somehow fails silently in your case with `'rb+'` to get this exclusive lock in the first place and just proceeds without blocked waiting. — Darkonaut, Feb 17 '18 at 15:07
Another thing I'm reading is about advisory vs. mandatory locks, where only the latter seems to be enforced by the OS and the first being the default on linux. And there also seems to be a difference between process-bound locks (`fcntl()`) vs. locks through system's file descriptors (`flock()`). [difference-between-locking-with-fcntl-and-flock](https://stackoverflow.com/questions/29611352/what-is-the-difference-between-locking-with-fcntl-and-flock). — Darkonaut, Feb 17 '18 at 15:10
...Can't make much sense out of it ATM but working through this [linux/man-pages/man2/fcntl](http://man7.org/linux/man-pages/man2/fcntl.2.html) might help. There would also be this NFS-safe library: [flufl.lock library](http://flufllock.readthedocs.io/en/latest/docs/using.html) — Darkonaut, Feb 17 '18 at 15:11

janneb · Answer 1 · 2018-02-19T08:38:53.017

9

I don't see how the combination of file locks and os.replace() can make sense. When the file is replaced (that is, the directory entry is replaced), all the existing file locks (probably including file locks waiting for the locking to succeed, I'm not sure of the semantics here) and file descriptors will be against the old file, not the new one. I suspect this is the reason behind the race conditions causing you to lose some of the records in your tests.

os.replace() is a good technique to ensure that a reader doesn't read a partial update. But it doesn't work robustly in the face of multiple updaters (unless losing some of the updates is ok).

Another issues is that fcntl is a really really stupid API. In particular, the locks are bound to the process, not the file descriptor. Which means that e.g. a close() on ANY file descriptor pointing to the file will release the lock.

One way would be to use a "lock file", e.g. taking advantage of the atomicity of link(). From http://man7.org/linux/man-pages/man2/open.2.html:

Portable programs that want to perform atomic file locking using a lockfile, and need to avoid reliance on NFS support for O_EXCL, can create a unique file on the same filesystem (e.g., incorporating hostname and PID), and use link(2) to make a link to the lockfile. If link(2) returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.

If it's Ok to read slightly stale data then you can use this link() dance only for a temp file that you use when updating the file and then os.replace() the "main" file you use for reading (reading can then be lockless). If not, then you need to do the link() trick for the "main" file and forget about shared/exclusive locking, all locks are then exclusive.

Addendum: One tricky thing to deal with when using lock files is what to do when a process dies for whatever reason, and leaves the lock file around. If this is to run unattended, you might want to incorporate some kind of timeout and removal of lock files (e.g. check the stat() timestamps).

edited Feb 19 '18 at 08:38

answered Feb 19 '18 at 08:14

janneb

36,249
2
81
97

1

Thanks for your answer! So you are suggesting to not use fnctl() and flock() at all, and rather use this link() trick to basically implement exclusive file locking, forgetting about shared locks? – Ant Feb 19 '18 at 08:25
@Ant: Essentially, yes. – janneb Feb 19 '18 at 08:36
+1 So, some problems were solved by avoiding a lock on os.replace and using another separate file to lock on. Problems still persist on NFS, where the same code fails to pass the tests that are passed on mac OS. But is there any way to implement shared locking with that link / unlink trick on NFS? I haven't mentioned it in my question but I also have a "global lock", i.e. some operations are permitted when only one instane is active. This is implemented by acquiring a shared lock as soon as an instance is created, and releasing it when destroyed. Then if you try to get an exclusive lock – Ant Feb 20 '18 at 10:57
... and you're succesful, it means you are the only active instance. To have this behavior work correctly I need shared locks, right? (Edit) I just had another idea; can't I just use some flags file to signal which process has locks on a particular file? So I create a file with a certain name for each process and look for it and determine whether I have shared or exclusive lock. Does NFS files are immediately seen by all processes, so I can be sure this will work? And if yes, why is fcntl *still* unable to provide file locking? I am using NFSv3, which according to the spec should support locks – Ant Feb 20 '18 at 10:58
Hi! I decided to award the bounty to you, as this answer was very helpful :) I would appreciate it though it you could comment on how to implement shared locks on nfs :) – Ant Feb 23 '18 at 08:08
@Ant: If you consider the link() trick roughly equivalent to a mutex, you could look how reader-writer locks (rwlocks) are implemented on top of mutexes as inspiration. – janneb Feb 23 '18 at 08:23
@Ant: E.g. something like https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock#Using_two_mutexes – janneb Feb 23 '18 at 08:25
There is also this solution https://stackoverflow.com/a/46161554/5133167 – Labo Dec 26 '19 at 08:53

mc0e · Answer 2 · 2018-02-19T03:55:14.413

Using randomly named hard links and the link counts on those files as lock files is a common strategy (E.g. this), and arguable better than using lockd but for far more information about the limits of all sorts of locks over NFS read this: http://0pointer.de/blog/projects/locking.html

You'll also find that this is a long standing standard problem for MTA software using Mbox files over NFS. Probably the best answer there was to use Maildir instead of Mbox, but if you look for examples in the source code of something like postfix, it'll be close to best practice. And if they simply don't solve that problem, that might also be your answer.

+1 thanks for your answer. I was reading that lockf does not work on NFS, which is what python implements. So I need to do something else to do proper file locking on NFS, right? — Ant, Feb 20 '18 at 11:13

score 4 · Answer 3 · answered Feb 23 '18 at 00:45

NFS is great for file sharing. It sucks as a "transmission" medium.

I've been down the NFS-for-data-transmission road multiple times. In every instance, the solution involved moving away from NFS.

Getting reliable locking is one part of the problem. The other part is the update of the file on the server and expecting the clients to receive that data at some specific point-in-time (such as before they can grab the lock).

NFS isn't designed to be a data transmission solution. There are caches and timing involved. Not to mention paging of the file content, and file metadata (e.g. the atime attribute). And client O/S'es keeping track of state locally (such as "where" to append the client's data when writing to the end of the file).

For a distributed, synchronized store, I recommend looking at a tool that does just that. Such as Cassandra, or even a general-purpose database.

If I'm reading the use-case correctly, you could also go with a simple server-based solution. Have a server listen for TCP connections, read messages from the connections, and then write each to file, serializing the writes within the server itself. There's some added complexity in having your own protocol (to know where a message starts and stops), but otherwise, it's fairly straight-forward.

+1 Thank you for your answer! Unfortunately I can't change the filesystem, it is set by the server owners. I was also trying to avoid having another program running (i.e. a server-based solution) to make it as simple as possible. But you think it is impossible to do? — Ant, Feb 23 '18 at 07:59
I think you will keep fighting with it until the owners of the decision realize it's not going to work out. That's why I did the same more than once - I failed to convince others that it was a poor fit of a solution to the problem. — ash, Feb 25 '18 at 19:45
Note that I anticipate all network filesystems (SMB, APFS, ...) will suffer the same results. It's the underlying problem they are solving is file storage and access - not reliable streaming of a shared data flow across a network. — ash, Feb 25 '18 at 19:47

How to do proper file locking on NFS?

3 Answers3