5

When one must synchronize programs (shell scripts) via file system, I have found an flock-based solution to be recommended (should also work on NFS). The canonical example for usage from within a script (from http://linux.die.net/man/1/flock) is:

(
flock -s 200

# ... commands executed under lock ...

) 200>/var/lock/mylockfile 

I don't quite get why this whole construct ensures atomicity. In particular, I am wondering in which order flock -s 200 and 200>/var/lock/mylockfile are executed when e.g. bash executes these lines of code. Is this order guaranteed/deterministic? The way I understand it, it must be deterministic if this idiom should work. But since a sub shell is spawned in a child process, I do not understand how these two processes synchronize themselves. I only see a race condition between these two commands already.

I would appreciate if someone could make my confusion about this disappear and explain why this construct can be used to safely synchronize processes.

At the same time, if someone knows, I would be interested in how safe it is to chose just some arbitrary file descriptor (such as 200 in the example), especially in the context of a large NFS file system with many clients.

Community
  • 1
  • 1
Dr. Jan-Philip Gehrcke
  • 33,287
  • 14
  • 85
  • 130

1 Answers1

6

The whole I/O context of the sub-shell (...) 200>/var/lock/mylockfile has to be evaluated — and the I/O redirection done — before any commands can be executed in the sub-shell, so the redirection always precedes the flock -s 200. Think about if the sub-shell had its standard output piped to another command; that pipe has to be created before the sub-shell is created. The same applies to the file descriptor 200 redirection.

The choice of file descriptor number really doesn't matter in the slightest — beyond it is advisable not to use file descriptors 0-2 (standard input, output, error). The file name matters; different processes could use different file descriptors; as long the name is agreed upon, it should be fine.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Okay, thanks a lot, that makes sense when applying the `command -> pipe -> output_device` perspective. And just to clarify: it also won't matter if multiple instances try to create `/var/lock/mylockfile` at the same time, only one instance calling `flock -s 200` can win, right? – Dr. Jan-Philip Gehrcke Aug 01 '13 at 14:28
  • Well, [`flock -s`](http://linux.die.net/man/1/flock) requests a shared lock; that prevents anyone modifying the file, but allows multiple processes to lock it in shared mode too. You'd want `flock -x` (or no flag) to get an exclusive lock. – Jonathan Leffler Aug 01 '13 at 14:31
  • 1
    Right, so now I understand http://stackoverflow.com/a/169969/145400. Thanks again! – Dr. Jan-Philip Gehrcke Aug 01 '13 at 14:34
  • With the `-w 10` in that answer, the code should be `if flock -x -w 10 200; then ...use the file...; fi` whereas the code currently ploughs on assuming that the timeout wasn't exceeded and the lock was successful. Comment also added to the other post. – Jonathan Leffler Aug 01 '13 at 14:39
  • Sure. I won't use it with a timeout. One more thing (may be obvious, but worth asking): The lock is automatically released as soon as the subshell finishes, yes? And a `rm /var/lock/mylockfile` after the entire code block will do the cleanup, right? – Dr. Jan-Philip Gehrcke Aug 01 '13 at 14:44
  • Yes (the lock is automatically released when the sub-shell finishes). Yes (removing the lock file after the code block does the cleanup). – Jonathan Leffler Aug 01 '13 at 14:59