1

I'm trying to write a bash script to observe a catalina.out logfile. Every time the string java.lang.OutOfMemoryError appears it should restart the tomcat.

(I know it would be better to find the bug inside the webapp but the webapp needs to run until the developers find the bug. So this is only a hotfix for (hopefully) a short time.)

The problem is that “break” doesn't leave the while loop immediately when it finds the pattern. More details below.

Here the script:

#!/bin/bash

LOGFILE=/var/log/catalina.out
LOCKFILE=/tmp/pointtest.lock

while [ /bin/true ]
do
  echo "Starting tail-Loop"
  tail -fn0 ${LOGFILE} | \
  while read line
  do
    echo "Inside tail-Loop"
    echo "$line" | grep "java.lang.OutOfMemoryError" > /dev/null
    if [ $? = 0 ]
    then
      echo "Error! java.lang.OutOfMemoryError"
      # here should be the command to restart tomcat

      break
    fi

  done
  echo "Left tail-Loop"

done

Here a test input file (filename: noErrors) with no errors:

Something
Something
Something
Something
Anything
Something
Something
Do something
And so on
Anything
Something

Here a test input file (Filename: oomErrors) with Out of memory errors:

Something
Something
        java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
                at java.util.concurrent.FutureTask.report(FutureTask.java:122)
                at java.util.concurrent.FutureTask.get(FutureTask.java:192)
                at org.apache.catalina.core.ContainerBase.threadStart(ContainerBase.java:1276)
                at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessorMonitor.run(ContainerBase.java:1322)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
                at java.lang.Thread.run(Thread.java:748)
        java.lang.OutOfMemoryError: Java heap space
Something
Something
Anything
        java.lang.OutOfMemoryError: Java heap space
        java.lang.OutOfMemoryError: Java heap space
Something
Something
Do something
And so on
        java.lang.OutOfMemoryError: Java heap space
Anything
Something

First:

touch /var/log/catalina.out

Then start the script.

Output:

Starting tail-Loop

Now I'm doing the following command:

cat noErrors >> /var/log/catalina.out

The output of the script:

Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Inside tail-Loop

So until here the script works as expected.

Now I'm doing the following command:

cat oomErrors >> /var/log/catalina.out

Output of the script:

Inside tail-Loop
Inside tail-Loop
Inside tail-Loop
Error! java.lang.OutOfMemoryError

The first 2 lines of oomErrors are handled correct. Then the script finds the OOM-pattern but it doesn't leave the tail-while-loop.

Then I do this command:

cat noErrors >> /var/log/catalina.out

The output of the script:

Left tail-Loop
Starting tail-Loop

Only here it left the tail-while-loop and restarts the tail-while-loop.

It seems that after detecting the first line with error it ignores the following lines and stops at the break command.

So my question: Why doesn't the script leave the tail-while-loop immediately after finding the first line with the error string and how can I fix this?

Thanks in advance :-)

jiwopene
  • 3,077
  • 17
  • 30
  • I think you may be overcomplicating this. How about a `crontab` entry to just `if nice grep -s OutOfMemoryError "$LOGFILE"; then cd $tomcatDir; ./shutdown.sh; sleep $someDelay; ./startup.sh; fi` every ten or twenty minutes? – Paul Hodges Jan 02 '20 at 14:17
  • @DarkTranquility : You have two loops nested. The `break` aborts the inner one, but the outer loop continues to run. – user1934428 Jan 02 '20 at 14:26
  • Buffering of pipes might be hitting you. See for example: https://stackoverflow.com/questions/3465619/how-to-make-output-of-any-shell-command-unbuffered/25548995 – knittl Jan 02 '20 at 14:32
  • 1
    Aside: your initial condition looks a bit off. That should be `while /bin/true` without the `test`/`[` command. – knittl Jan 02 '20 at 14:33
  • @user1934428: That's right. The outer loop should never terminate. Only the inner loop after finding the pattern. – DarkTranquility Jan 02 '20 at 14:37
  • @Paul Hodges: Poorly I need it just in time. The delay of the crontab is not a good solution in my case. – DarkTranquility Jan 02 '20 at 14:38
  • @knittl: Interesting info. I didn't know that pipes are buffering. I tried the stdbuf- and unbuffer-solution mentioned in the link. But both don't help. If there would be a buffering problem, the missing error lines should occour, too. But they don't. – DarkTranquility Jan 02 '20 at 14:43
  • @DarkTranquility are you by any chance truncating the file or replacing it with a new file of the same name (i.e. log rotation)? You might need to use `tail -F` then (note the captial F). Your manual reproduction recipe suggests otherwise, but I want to double check – knittl Jan 02 '20 at 14:46

1 Answers1

0

Simplify a bit and eliminate the unnecessary overhead of the grep.

while : # : is a synonym for true
do echo "Starting tail-Loop"
   while read line
   do echo "Inside tail-Loop"
      case "$line" in
      *java.lang.OutOfMemoryError*) 
        echo "Error! java.lang.OutOfMemoryError"
        # restart tomcat here
        break
     ;;
     esac
   done < <( tail -fn0 ${LOGFILE} )
   echo "Left tail-Loop"
done
Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
  • Note: AFAIK `<(…)` is a bashism and will not work in all shells – knittl Jan 02 '20 at 14:53
  • I don't even think the `:` synonym for `true` works *everywhere*, lol. You are absolutely correct, but he's explicitly calling`#!/bin/bash` at the top (though that also might still fail depending on version...) – Paul Hodges Jan 02 '20 at 15:00
  • Actually, both `:` and `true` are defined by POSIX, see https://stackoverflow.com/a/3224910/112968 – knittl Jan 02 '20 at 15:03
  • @PaulHodges: Thanks a lot for your solution. It works :) But just a question: Why do I need 2 "<" between "done" and "(tail -fn0 ..."? – DarkTranquility Jan 02 '20 at 15:20
  • One is "take stdin for this loop from", the other is part of the `<(...)` syntax to "run this as a subshell and return a stream". – Paul Hodges Jan 02 '20 at 15:24