How to avoid race conditions in a bash script?

Question

#!/bin/bash
if [ ! -f numbers ]; then echo 0 > numbers; fi
count=0
touch numbers
echo $count > numbers
while [[ $count != 100 ]]; do
  if ln numbers numbers.lock
  then
    count=`expr $count + 1`
    n=`tail -1 numbers`
    expr $n + 1 >> numbers
    rm numbers.lock
  fi
done

What can I do to avoid the race condition of count=`expr $count + 1` and the n=`tail -1 numbers`, so that when I run two of the scripts at the same time, it only goes to 100, and not 200. I've researched on multiple websites, but there are no concise answers without making a huge function.

Why are you running this script twice at the same time? Why aren't you using a safe temporary file for your storage? — Etan Reisner, Feb 18 '15 at 04:01
@EtanReisner its a so we learn that there are race conditions and they need to be avoided — StingRay21, Feb 18 '15 at 04:12
So you are trying to prevent your script from running twice at the same time then? — Etan Reisner, Feb 18 '15 at 04:14
No im trying to ensure they can both run, preventing the race conditions — StingRay21, Feb 18 '15 at 04:20
Your `if ln numbers numbers.lock` test prevents simultaneous updates; there is no race condition in the body of the `if` statement. — Jonathan Leffler, Feb 18 '15 at 04:25

tripleee · Accepted Answer · 2021-10-28T18:01:17.880

You are already safely avoiding the actual race condition with the lock file. The problem you are describing can be avoided two ways.

(1) Move the lock file outside the main loop, so that two instances of your program cannot run their main loop at the same time. If one is running, the other will have to wait until it's done, then start replacing the output file.

#!/bin/bash

# FIXME: broken, see comments

while true; do
    if ! ln numbers numbers.lock
    then
       sleep 1
    else
        if [ ! -f numbers ]; then echo 0 > numbers; fi
        count=0
        touch numbers
        #echo $count > numbers   # needless, isn't it?
        while [[ $count != 100 ]]; do
            count=`expr $count + 1`
            n=`tail -1 numbers`
            expr $n + 1 >> numbers
            rm numbers.lock
        done
        break
    fi
done

(2) Make the two instances cooperate, by examining what the contents of the file are. In other words, force them to stop looping when the number reaches 100, regardless of how many other processes are writing to this file. (I guess there is an iffy corner case when there are more than 100 instances running.)

#!/bin/bash
# FIXME: should properly lock here, too
if [ ! -f numbers ]; then echo 0 > numbers; fi
n=0
touch numbers
while [[ $n -lt 100 ]]; do
  if ln numbers numbers.lock
  then
    n=$(expr $(tail -1 numbers) + 1 | tee numbers)
    rm numbers.lock
  fi
done

Depending on your requirements, you might actually want the script to clobber any previous value in the file when a new instance of the script is starting, but if not, the echo 0 > numbers should be governed by the lock file, too.

You really want to avoid expr in a Bash script; Bash has built-in arithmetic operators. I have not attempted to refactor that part here, but you probably should. Perhaps prefer Awk so you can factor out the tail too; awk '{ i=$0 } END { print 1+i }' numbers

The first example seems to need a `touch numbers` before the `while true` loop. The `ln` command will fail if numbers doesn't exist, so it can never be created by that code if it doesn't already exist — SpinUp __ A Davis, Oct 27 '21 at 18:07
I think using `ln -s` in place of plain `ln` should also work for the 3rd line of code in the first example -- then the command will fail if `numbers.lock` exists, but merrily carry on even if `numbers` doesn't exist yet — SpinUp __ A Davis, Oct 27 '21 at 18:19
What would the point of that be? The _purpose_ of the operation is to fail if something is not right. But yes, the `touch` was missing, thanks for noticing that. — tripleee, Oct 28 '21 at 05:10
But a missing `numbers` file isn't treated as a failure in your code -- you explicitly check for its existence and create it if necessary in the `if [ ! -f numbers ]` line. If you keep the `ln` as it is (not `ln -s`) that existence check is totally pointless, since it's already contained in `if ! ln ...`. Also `rm numbers.lock` should go *outside* the loop (`while [[ $count != 100 ]]`), otherwise you're re-introducing the race condition (and generating 99 error messages to the terminal). — SpinUp __ A Davis, Oct 28 '21 at 17:39
You're right ... I'm not sure what I was thinking at the time; I can't get the first one to work at all. — tripleee, Oct 28 '21 at 18:00
fair enough -- still upvoted though because the example was close to my application and helped set me on the right track — SpinUp __ A Davis, Oct 28 '21 at 21:39

mgutt · Answer 2 · 2022-01-14T09:03:48.237

I put this one-liner on the top of my script to make it race condition safe:

if [[ -d "/tmp/${0//\//_}" ]] || ! mkdir "/tmp/${0//\//_}"; then echo "Script is already running!" && exit 1; fi; trap 'rmdir "/tmp/${0//\//_}"' EXIT;

By that I don't need to think about any race conditions in my further code.

Break down the code:

[[ -d "/tmp/${0//\//_}" ]] checks if the lock dir /tmp/_path_to_script_scriptname.sh/ exists. Note: $0 contains the name of the script.
mkdir "/tmp/${0//\//_}" creates the dir if it does not exist
then ... exit 1 abort script if lock dir already exists (which means the script is already running)
trap 'rmdir "/tmp/${0//\//_}"' EXIT automatically deletes the lock dir if the script is exited (which does not hit on race conditions as the trap command is defined later.

Note: In very rare cases like server crashes the lock dir gets not deleted. For this you could think about a cronjob which checks for outdated lock dirs. If you need trap in your script (which can't be set twice), than use one of the different multi trap solutions.

score 0 · Answer 3 · answered Feb 18 '23 at 02:59

I've recently had to create my own "flock" function as the script I'm writing needs to use TRAP which can't be used along side the flock command.

Here is the code

#!/bin/bash
flock() {
    local lock_name lock_path lock_pid check_pid script_arg script_source script_pid
    lock_name="${1}"
    script_pid="$$"
    script_source="${BASH_SOURCE[0]}"
    script_arg=("${BASH}" "${script_source}")
    lock_path="$(dirname -- "$(realpath "${script_source[0]}}")")/${lock_name}_flock"
    for ((i=0;i<${#BASH_ARGV[@]}; i++)); do 
        script_arg+=("${BASH_ARGV[~i]}")
    done
    if [[ -f "${lock_path}" ]]; then
        read -r lock_pid < "${lock_path}" > /dev/null 2>&1
        if [[ -n "${lock_pid[0]}" ]]; then
            check_pid=$(ps -eo pid,cmd \
                | awk -v a="${lock_pid}" -v b="${script_arg[*]}" '$1==a && $2!="awk" && index($0,b) {print $1}')
            if [[ -n "${check_pid}" ]]; then
                printf "%s\n" "Script is already running"
                exit 0
            fi
        fi
    fi
    read -r check_pid < <(ps -eo pid,cmd | awk -v a="${script_arg[*]}" '$2!="awk" && index($0,a) {print $1}')
    if [[ "${script_pid}" -ne "${check_pid[0]}" ]]; then
        printf "%s\n" "Race condition prevented"
        exit 0
    fi
    printf '%s' "${script_pid}" | tee "${lock_path}" > /dev/null 2>&1
}

To use this you just add the function followed by whatever you want the lock file to be called.

flock script_name

Explanation

The function uses BASH, BASH_SOURCE and BASH_ARGV to create what the CMD column would look like under the process status.

Example: ./test.sh arg1 arg2 "this arg3"

CMD: /bin/bash ./test.sh arg1 arg2 this arg3

We then use the process status to only show us any item that matches the CMD, except we place all their PID's into an array.

Next we only return the first value of that array and compare it to the scripts PID, if the scripts PID does not match the first PID in our array, then then we exit, otherwise we store the scripts PID in our lock file.

If the lock file already exists and contains an existing PID, then the function will check to see if the script is still runnning by searching for the PID and the CMD values, if it's running then it will exit, otherwise the PID in the lock file will be updated to the new PID.

The function not only prevents race conditioning but will continue to work even if your system crashed as it doesn't rely on the lock file to determin whether or not the script is running.

$$

Expands to the process ID of the shell. In a subshell, it expands to the process ID of the invoking shell, not the subshell.

BASH_SOURCE

An array variable whose members are the source filenames where the corresponding shell function names in the FUNCNAME array variable are defined. The shell function ${FUNCNAME[$i]} is defined in the file ${BASH_SOURCE[$i]} and called from ${BASH_SOURCE[$i+1]}

BASH_ARGV

An array variable containing all of the parameters in the current bash execution call stack. The final parameter of the last subroutine call is at the top of the stack; the first parameter of the initial call is at the bottom. When a subroutine is executed, the parameters supplied are pushed onto BASH_ARGV. The shell sets BASH_ARGV only when in extended debugging mode (see The Shopt Builtin for a description of the extdebug option to the shopt builtin). Setting extdebug after the shell has started to execute a script, or referencing this variable when extdebug is not set, may result in inconsistent values.

BASH_ARGV returns the arguments in reverse. In order to fix this I run the following code in the function thanks to a post from user232326 that works with bash 4.2+

for ((i=0;i<${#BASH_ARGV[@]}; i++)); do 
    script_arg+=("${BASH_ARGV[~i]}")
done

How to avoid race conditions in a bash script?

3 Answers3