Prevent race condition when creating lock file

Question

My script must not be run more than once concurrently. So it creates a lock file, and deletes it before exiting. It checks that lock file doesn't exist before starting its work.

A very common approach to locking is something like this:

function setupLockFile() {
  if (set -o noclobber; echo "lock" > "$lockfile") 2>/dev/null; then
    trap "rm -f $lockfile; exit $?" INT TERM EXIT
  else
    echo "Script running... exiting!" 
    exit 1
  fi
}

However there is a race condition - the if creates the file if it doesn't exist, and the script could be terminated before the trap is defined. Then the lockfile will not be deleted.

So what is a safe way to do this?

score 4 · Accepted Answer · answered Sep 11 '18 at 11:57

That's not a race - it's resilience to failure. In situations where the script dies before it can remove the file, you need manual cleanup.

The usual way to try and automate this cleanup is to read the PID from any existing file, test to see if the process still exists, and essentially ignore its existence if it doesn't. Unfortunately without an atomic compare-and-set operation that's not trivial to do correctly, since it introduces a new race, between the read of the PID and someone else trying to ignore its existence.

Check out this question for more ideas around locking using just the file system.

My advice is to either store the lock file on a temporary filesystem (/var/run is usually tmpfs to permit pidfiles to disappear safely on reboot) so that things fix themselves after a reboot, or have the script throw up its hands and ask for manual intervention. Handling every failure case reliably increases complexity and thus probably introduces more probability of failure than asking a human for help.

And complexity isn't just today, it's for the lifetime of the code. It might be correct when you're done, but will the next person along break it?

Jiri Valenta · Answer 2 · 2018-09-12T09:32:03.737

1

Let's try another approach:

set up the trap before lock file is created
store PID in the lock file
make the trap check if the PID of current instance matches whatever is in the lockfile

For example:

trap "cleanUp" INT TERM EXIT

function cleanUp {
  if [[ $$ -eq $(<$lockfile) ]]; then
    rm -f $lockfile
    exit $?
  fi
}

function setupLockFile {
  if ! (set -o noclobber; echo "$$" > "$lockfile") 2>/dev/null; then
    echo "Script running... exiting!"
    exit 1
  fi
}

This way you keep the check for lock file existence and its creation as a single operation, while also preventing the trap from deleting a lockfile of a previously running instance.

Additionally, as I mentioned in the comments below, in case the lock file already exist I'd suggest to check if a process with given PID is running. Because you never know if for whatever reason the lock file can still remain orphaned on the disk. So if you want to mitigate the need for manual removal of orphaned lock fiels, you can add additional logic to check if the PID is orphaned or not.

For example - if no running process with given PID from the lock file not found, you can assume that this is an orphaned lock file from aprevious instance that, and you can overwrite it with your current PID and continue. If a process is found, you can compare its name to see if it really is another instance of the same script or not - if not, you can overwrite the PID in the lock file and continue.

I did not include this in the code to keep it simple, you can try to create this logic by yourself if you want. :)

edited Sep 12 '18 at 09:32

answered Sep 11 '18 at 12:00

Jiri Valenta

500
2
8

1

There's a race in your code between checking for existence of the file and deciding to overwrite it. That's what the OP's code is designed to protect against. – Barry Kelly Sep 11 '18 at 12:02
The way I understood his request is that he wants to avoid the lockfile being created before trap is defined, so that the script can't leave an orphaned lockfile behind. Which is what my example solves. But yeah, I understand that what you are hinting at. – Jiri Valenta Sep 11 '18 at 12:05
The idea is to prevent concurrent running of some script - and to avoid a race condition, and to avoid orphaned files. – lonix Sep 11 '18 at 12:09
Yeah, you're right. I guess better approach is to store the current PID into the lockfile, and then augmenting the clean up in the trap to check if the lockfile that it is deleting belongs to the current instance of the script. This way you can define the trap before you create the lockfile in the if condition while not worrying that you would delete lockfile of another running instance. – Jiri Valenta Sep 11 '18 at 12:15
I've rewritten my answer, please check and let me know what do you think. Is it bulletproof enough? :) – Jiri Valenta Sep 11 '18 at 12:25
I think @BarryKelly is right when he said this is a complicated problem to solve (easily). In your new code the race still exists - as in my code, it occurs between the `if` and the `trap`. – lonix Sep 11 '18 at 12:47
Ah, sorry, I completely missed that :) The trap should be defined before the lockfile is created, of course. Or you can add logic to check if the process with the PID is running when the lock file already exists. (this is what I am doing in my scripts, and I also check if the process name matches the current one to check if it is really another instance of the same script, and not some other unrelated process that just happened to take over the same PID). I will update my answer. :) – Jiri Valenta Sep 12 '18 at 07:38

Ipor Sircer · Answer 3 · 2018-09-11T12:03:43.263

0

First check for lockfile, then trap, then write to it:

function setupLockFile() {
  if [ -f "$lockfile" ]; then
    echo "Script running... exiting!" 
    exit 1
  else trap "rm -f $lockfile; exit $?" INT TERM EXIT
       set -o noclobber; echo "lock" > "$lockfile" || exit 1
  fi
}

And there is an "official" way to check lockfiles with flock command which is part of util-linux.

edited Sep 11 '18 at 12:03

answered Sep 11 '18 at 12:00

Ipor Sircer

3,069
3
10
15

1

If the script crashes, you get into a permanent failure state where the script never starts up again. IMO this is fine, but it's not what the OP was trying to achieve. – Barry Kelly Sep 11 '18 at 12:01
Have fun calling `flock(2)` from bash :) – Barry Kelly Sep 11 '18 at 12:03
The op wanna "_the script could be terminated **before** the trap is defined_". – Ipor Sircer Sep 11 '18 at 12:03
He can run the script as `./script;rm "$lockfile"`, then the lockfile will be deleted even if it crashes. – Ipor Sircer Sep 11 '18 at 12:08

Prevent race condition when creating lock file

3 Answers3