So I have a systemd unit that needs to be monitored, restarted in case of a crash and also something done in case the unit fails. I'm working on an embedded system so this needs to be robust.
In my case we have a systemd service:
Description=Demo unit
Wants=multi-user.target
OnFailure=FailHandler@%N.service
[Service]
ExecStart=/bin/bash /home/root/demo.sh
Restart=on-failure
RestartSec=1
Type=simple
The bash I start:
echo "Started demo.sh"
current_date=`date`
sleep 10s
echo "${current_date} Demo was here" >> /home/root/demo.txt
exit 1
So far so good. The bash always exits with 1 afer 10 seconds and logs the time. The problem is that FailHandler is never called in that case. Now this is just a demo all of the applications are in C++ but the behavior is the same. Now if I manually set the wrong path to the bash file it unit fails but it starts the "OnFailure" part. Here's syslog output from having correct path:
2021-09-03T13:06:31.575094+00:00 hostname bash[1125]: Started demo.sh
2021-09-03T13:06:41.629450+00:00 hostname systemd[1]: demo.service: Main process exited, code=exited, status=1/FAILURE
2021-09-03T13:06:41.644681+00:00 hostname systemd[1]: demo.service: Failed with result 'exit-code'.
2021-09-03T13:06:41.818089+00:00 hostname systemd[1]: demo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:06:41.824005+00:00 hostname systemd[1]: demo.service: Scheduled restart job, restart counter is at 1.
2021-09-03T13:06:41.850933+00:00 hostname bash[1179]: Started demo.sh
2021-09-03T13:06:51.870376+00:00 hostname systemd[1]: demo.service: Main process exited, code=exited, status=1/FAILURE
2021-09-03T13:06:51.872611+00:00 hostname systemd[1]: demo.service: Failed with result 'exit-code'.
2021-09-03T13:06:52.117479+00:00 hostname systemd[1]: demo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:06:52.136102+00:00 hostname systemd[1]: demo.service: Scheduled restart job, restart counter is at 2.
2021-09-03T13:06:52.163865+00:00 hostname bash[1221]: Started demo.sh
Here's output from when path is incorrect:
2021-09-03T13:07:46.582269+00:00 hostnaem bash[1446]: /bin/bash: /ahome/root/daemo.sh: No such file or directory
2021-09-03T13:07:46.588715+00:00 hostnaem systemd[1]: daemo.service: Main process exited, code=exited, status=127/n/a
2021-09-03T13:07:46.590356+00:00 hostnaem systemd[1]: daemo.service: Failed with result 'exit-code'.
2021-09-03T13:07:46.694616+00:00 hostnaem systemd[1]: daemo.service: Service RestartSec=100ms expired, scheduling restart.
2021-09-03T13:07:46.701519+00:00 hostnaem systemd[1]: daemo.service: Scheduled restart job, restart counter is at 1.
2021-09-03T13:07:46.720879+00:00 hostnaem systemd[1]: daemo.service: Start request repeated too quickly.
2021-09-03T13:07:46.721405+00:00 hostnaem systemd[1]: daemo.service: Failed with result 'exit-code'.
2021-09-03T13:07:46.722723+00:00 hostnaem systemd[1]: daemo.service: Triggering OnFailure= dependencies.
2021-09-03T13:07:46.804815+00:00 hostnaem FailHandler.sh[1457]: Failed application: daemo
2021-09-03T13:07:46.822342+00:00 hostnaem bash[1457]: error: cannot stat /etc/logrotate.d/daemo: No such file or directory
2021-09-03T13:07:46.841577+00:00 hostnaem FailHandler.sh[1457]: ERROR: Failed logrotate for daemo crash
2021-09-03T13:07:46.977003+00:00 hostnaem systemd[1]: FailHandler@daemo.service: Succeeded.
I understand from the syslog that it starts the FailHandler whenever number of restarts reaches StartLimitBurst=1 within 100ms but is there a way that it starts anytime the application exits with an error code?