3

I am using this YAJSW to run Java Daemon on my Centos 5.5 machines. The think it runs well but out of suddent I notice I get this sort of error and then it just goes down. Any help what must I do to avoid this sort of problem? Can I use some monitoring tool to monitor and recover it as soon it falls into problem?

Below is just part of the error list.

NFO|3090/0|11-09-19 20:22:13|Controller State: LOGGED_ON -> PROCESS_KILLED
INFO|wrapper|11-09-19 20:22:13|restart process due to default exit code rule
INFO|wrapper|11-09-19 20:22:13|set state RUNNING->RESTART
INFO|wrapper|11-09-19 20:22:13|set state RESTART->RESTART_STOP
INFO|wrapper|11-09-19 20:22:13|stopping process with pid/timeout 3090 45000
INFO|3090/0|11-09-19 20:22:13|Controller State: PROCESS_KILLED -> WAITING_CLOSED
FINEST|3090/0|11-09-19 20:22:13|wrapper manager received stop command
INFO|3090/0|11-09-19 20:22:14|Controller State: WAITING_CLOSED -> USER_STOP
INFO|wrapper|11-09-19 20:22:14|stop config name null
INFO|wrapper|11-09-19 20:22:14|externalStop false
INFO|wrapper|11-09-19 20:22:14|exit code linux process 0
INFO|wrapper|11-09-19 20:22:14|killing 3090
INFO|3090/0|11-09-19 20:22:14|gobler execption OUTPUT 3090 null
INFO|3090/0|11-09-19 20:22:14|gobler execption ERROR 3090 null
INFO|3090/0|11-09-19 20:22:14|gobler terminated OUTPUT 3090
INFO|wrapper|11-09-19 20:22:14|process exit code: 0
INFO|3090/0|11-09-19 20:22:14|gobler terminated ERROR 3090
INFO|wrapper|11-09-19 20:22:14|set state RESTART_STOP->RESTART_WAIT
INFO|wrapper|11-09-19 20:22:19|set state RESTART_WAIT->RESTART_START
INFO|wrapper|11-09-19 20:22:19|starting Process
INFO|3090/0|11-09-19 20:22:19|Controller State: USER_STOP -> UNKNOWN
INFO|wrapper|11-09-19 20:22:19|Controller State: UNKNOWN -> WAITING
INFO|wrapper|11-09-19 20:22:20|working dir /usr/local
INFO|wrapper|11-09-19 20:22:20|error initializing script 
INFO|wrapper|11-09-19 20:22:20|exec:/usr/java/jdk1.6.0_18/bin/java -classpath /usr/local/yajsw-beta-10.2/./wrapperApp.jar:/usr/local -Xrs -Dwrapper.service=true -Dwrapper.console.visible=false -Dwrapper.visible=false -Dwrapper.pidfile=/var/run/wrapper.commServer.pid -Dwrapper.config=/usr/local/yajsw-beta-10.2/conf/wrapper.conf -Dwrapper.port=15003 -Dwrapper.key=-6288918147195966892 -Dwrapper.teeName=-6288918147195966892$1316434940036 -Dwrapper.tmpPath=/tmp org.rzo.yajsw.app.WrapperJVMMain 
INFO|wrapper|11-09-19 20:22:20|started process 8988
INFO|wrapper|11-09-19 20:22:20|started process with pid 8988
INFO|wrapper|11-09-19 20:22:20|set state RESTART_START->RUNNING
INFO|wrapper|11-09-19 20:22:34|Controller State: WAITING -> STARTUP_TIMEOUT
INFO|wrapper|11-09-19 20:22:34|restart process due to default exit code rule
INFO|wrapper|11-09-19 20:22:34|set state RUNNING->RESTART
INFO|wrapper|11-09-19 20:22:34|set state RESTART->RESTART_STOP
INFO|wrapper|11-09-19 20:22:34|stopping process with pid/timeout 8988 45000
INFO|wrapper|11-09-19 20:22:34|Controller State: STARTUP_TIMEOUT -> USER_STOP
INFO|wrapper|11-09-19 20:22:34|stop config name null
INFO|wrapper|11-09-19 20:22:34|externalStop false
INFO|wrapper|11-09-19 20:23:19|process did not stop after 45000 sec. -> hard kill
INFO|wrapper|11-09-19 20:23:19|killing 8988
INFO|wrapper|11-09-19 20:23:19|send kill sig
INFO|wrapper|11-09-19 20:23:19|exit code linux process 9
INFO|wrapper|11-09-19 20:23:19|Controller State: USER_STOP -> PROCESS_KILLED
INFO|8988/1|11-09-19 20:23:20|gobler execption OUTPUT 8988 null
INFO|8988/1|11-09-19 20:23:20|gobler execption ERROR 8988 null
INFO|wrapper|11-09-19 20:23:20|process exit code: 999
INFO|8988/1|11-09-19 20:23:20|gobler terminated OUTPUT 8988
INFO|8988/1|11-09-19 20:23:20|gobler terminated ERROR 8988
INFO|wrapper|11-09-19 20:23:20|set state RESTART_STOP->RESTART_WAIT
INFO|wrapper|11-09-19 20:23:25|set state RESTART_WAIT->RESTART_START
INFO|wrapper|11-09-19 20:23:25|starting Process
INFO|wrapper|11-09-19 20:23:25|Controller State: PROCESS_KILLED -> UNKNOWN
INFO|wrapper|11-09-19 20:23:25|Controller State: UNKNOWN -> WAITING
INFO|wrapper|11-09-19 20:23:25|working dir /usr/local
INFO|wrapper|11-09-19 20:23:25|error initializing script 
INFO|wrapper|11-09-19 20:23:25|exec:/usr/java/jdk1.6.0_18/bin/java -classpath /usr/local/yajsw-beta-10.2/./wrapperApp.jar:/usr/local -Xrs -Dwrapper.service=true -Dwrapper.console.visible=false -Dwrapper.visible=false -Dwrapper.pidfile=/var/run/wrapper.commServer.pid -Dwrapper.config=/usr/local/yajsw-beta-10.2/conf/wrapper.conf -Dwrapper.port=15003 -Dwrapper.key=-6288918147195966892 -Dwrapper.teeName=-6288918147195966892$1316435005686 -Dwrapper.tmpPath=/tmp org.rzo.yajsw.app.WrapperJVMMain 
INFO|wrapper|11-09-19 20:23:26|started process 8989
INFO|wrapper|11-09-19 20:23:26|started process with pid 8989
INFO|wrapper|11-09-19 20:23:26|set state RESTART_START->RUNNING
INFO|wrapper|11-09-19 20:23:40|Controller State: WAITING -> STARTUP_TIMEOUT
INFO|wrapper|11-09-19 20:23:40|restart process due to default exit code rule
INFO|wrapper|11-09-19 20:23:40|set state RUNNING->RESTART
INFO|wrapper|11-09-19 20:23:40|set state RESTART->RESTART_STOP
INFO|wrapper|11-09-19 20:23:40|stopping process with pid/timeout 8989 45000
INFO|wrapper|11-09-19 20:23:40|Controller State: STARTUP_TIMEOUT -> USER_STOP
INFO|wrapper|11-09-19 20:23:40|stop config name null
INFO|wrapper|11-09-19 20:23:40|externalStop false
INFO|wrapper|11-09-19 20:24:25|process did not stop after 45000 sec. -> hard kill
INFO|wrapper|11-09-19 20:24:25|killing 8989
INFO|wrapper|11-09-19 20:24:25|send kill sig
INFO|wrapper|11-09-19 20:24:25|exit code linux process 9
INFO|wrapper|11-09-19 20:24:25|Controller State: USER_STOP -> PROCESS_KILLED
INFO|8989/2|11-09-19 20:24:26|gobler execption OUTPUT 8989 null
INFO|8989/2|11-09-19 20:24:26|gobler execption ERROR 8989 null
INFO|wrapper|11-09-19 20:24:26|process exit code: 999
INFO|8989/2|11-09-19 20:24:26|gobler terminated OUTPUT 8989
INFO|8989/2|11-09-19 20:24:26|gobler terminated ERROR 8989
Jonathon Faust
  • 12,396
  • 4
  • 50
  • 63
user837306
  • 857
  • 3
  • 18
  • 32

3 Answers3

1

You can trace what the linux process is doing by attaching an strace to it.

If it is a problem with YAJSW itself and if you are looking for a simple wrapper to keep your job up and running, it can be done with a simple bash script.

until myjob; do
    echo "restarting myjob"
    sleep 10
done

Line 1 is a blocking call as long as myjob is running and if it exits with anything other than a 0 then, it will be restarted.

RHT
  • 4,974
  • 3
  • 26
  • 32
  • The problem is now I do not see any problem indicating from my application because if it is will be showing some clue. I just says this INFO|wrapper|11-09-19 20:22:13|restart process due to default exit code rule. So the script you suggested is a shell script run is it? How to run? So you ask it to sleep for 10ms is it? The problem is that my application need to keep running as it keep receiving data. – user837306 Sep 24 '11 at 02:28
  • it's a bash script, it sleeps for 10secs. If you need a continuous Thread why not start a new Thread in java itself? Too many questions ... have you tried decaf? – RHT Sep 24 '11 at 02:50
  • what is decaf software is it? – user837306 Sep 25 '11 at 02:08
0

I ran into very similar wrapper log output on Windows. In my case, multiple applications were running through yajsw instances. It appears that in some cases, automatic port selection by yajsw to monitor Java applications doesn't work properly.

In the yajsw instance that is failing, adding

wrapper.port = 24572

fixes the issue. Recreate the service after modifying the wrapper.conf. I only had to add this to the instance of yajsw that was failing; other instances auto-selected ports successfully. The port number doesn't matter, just choose an unused port.

Jonathon Faust
  • 12,396
  • 4
  • 50
  • 63
0

You could look: here - this could be a resource leak.

Artem Oboturov
  • 4,344
  • 2
  • 30
  • 48
  • The problem is not with too many files open any more as I have already increase the file descriptor. It just stop without any good clue what was the cause if you look into the log file attached above? – user837306 Sep 24 '11 at 02:25
  • 1
    if it is a resource leak what is the cause of it and how to trace it? – user837306 Sep 25 '11 at 02:08