4

I am currently trying to figure out why a shell script fails at concurrent logging every once in a while.

I have a shell function like the following:

log()
{
   local l_text=$1
   local l_file="/path/to/logs/$(date +%Y%m%d)_script.log"

   local l_line="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text}"

   echo ${l_line} >> ${l_file}
}

Now every once in a while this fails with a syntax error:

/path/to/script.sh: command substitution: line 163: syntax error near unexpected token `)'
/path/to/script.sh: command substitution: line 163: `hostname -s) ${l_text}'

The problem is, that I have multiple sub-processes, which each want to log as well as send traps (during which logging is performed as well). I haved debugged the problem and found out, that this happens, when the function is entered three times simultaneously. First the main process enters, then the child. After the date part of l_text is executed, main get's interrupted by a trap which is caused by child and in this trap tries to log something. The child and the trap finish their logging nicely, but then main is resumed after the trap and tries to execute the hostname part (presumedly) and fails with this error.

So it seems like main does not like being put to sleep while it is producing the $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) ${l_text} part of the log statement and cannot resume nicely. I was assuming this should work fine, because I am just using local variables and thread safe output methods.

Is this a general concurrency problem I am running into here? Or is this very specific for the trap mechanism in bash scripts? I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script. I tried to find documentation on this, but none of the documents I could find gave any indications of problems with SIGNAL handling in scripts.

EDIT:

Here is an actuall simple script that can be used to replicate the problem:

#!/bin/bash

log() {
  local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
  echo $text >> /dev/null
}

sub_process() {
  while true; do
    log "Thread is running"
    kill -ALRM $$
    sleep 1
  done
}

trap "log 'received ALRM'" ALRM 

sub_process &
sub_process_pid=$!
trap "kill ${sub_process_pid}; exit 0" INT TERM


while true; do
  log "Main is running"
  sleep 1
done

Every once in a while this script will get killed because of a syntax error in line 5. Line 5 is echo $text >> /dev/null, but since the syntax error also mentiones the hostname command, similar to the one I posted above, I am assuming there is an of-by-one error as well and the actual error is in line 4, which is local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1".

Does anybody know what to do with the above script to correct it? I alread tried moving out the construction of the string into some temporary variables:

log() {
  local thedate=$(date +'%Y-%m-%d %H:%M:%S')
  local thehostname=$(hostname -s)
  local text="${thedate} ${thehostname} $1"
  echo $text >> /dev/null
}

This way the error appears less frequently, but it still is present, so this is not a real fix.

Jolta
  • 2,620
  • 1
  • 29
  • 42
LiKao
  • 10,408
  • 6
  • 53
  • 91
  • 5
    That whitespace around those `=` is illegal. – kojiro Apr 17 '12 at 16:35
  • @kojiro: Thanks for the hint... The whitespace is not present in the original and I accidentally introduced it when reducing the code to a good example. I'll edit this. – LiKao Apr 17 '12 at 16:54
  • 1
    The name `thread` is misleading here, bash uses sub-processes, not threads... I guess you want `$!` instead of `$?` in `trap "kill $?; exit 0" INT TERM` and `INT`/`TERM`/`ALRM` are missing `SIG` (`SIGINT`). – Doncho Gunchev May 04 '12 at 07:15
  • @dgunchev: Thanks again... My bash skills are not the best, so I mistyped. The name thread was used because this is mainly the way, that the function is used in the original code. I changed it to `sub_process`, in order not to confuse anyone. About the other remark: The trap command actually only uses a shorter notation for signals, without the `SIG` part. I guess this was considered redundant in the design and therefore left out, so this is ok. The behaviour if you drop the redirect to `/dev/null` shows, that the traps are working. – LiKao May 04 '12 at 08:12
  • The funny thing here is that the `$$` variable stays the same in the sub-shells unlike `$BASHPID`. Check [in-a-script-vs-in-a-subshell](http://stackoverflow.com/questions/5615570/in-a-script-vs-in-a-subshell) for details. Also, just for info, the traps are not inherited by subshells. – Doncho Gunchev May 17 '12 at 15:25

1 Answers1

1

I would say that this is definitely a bug in bash and I would encourage you to report it to the bash developers. At the very least, you should never get a syntax error for what is syntactically correct code.

For the record, I get the same results as you with GNU bash, version 4.2.10(1)-release (x86_64-pc-linux-gnu).

I found that you can workaround the problem by not calling a function in your trap handler. E.g. replacing

trap "log 'received ALRM'" ALRM 

with

trap "echo $(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) received ALRM" ALRM

makes the script stable for me.

I know about the commodities of SIGNAL handling in C, so I am aware that only certain operations are allowed in SIGNAL handlers. However I am not aware if the same precautions also apply when handling SIGNALs in a bash script.

I guess you shouldn't have to take special precautions but apparently in practice you do. Given that the problem seem to go away without the function call, I'm guessing that something in bash either isn't re-entrant where it should be or fails prevent re-entry in the first place.

Lasse
  • 686
  • 4
  • 9
  • Ok, I already reported the problem, but so far I have not received any meaningfull feedback. I guess they are on it and I hope they'll find a solution in one of the next releases. – LiKao May 14 '12 at 13:18