Rails: 6.0.3
Sidekiq: 6.1.2
Ruby 2.7.2
Running on AWS Amazon Linux 2
I'm running a fairly simply Sidekiq configuration on production, and using the boilerplate systemd/sidekiq.service file from the examples directory in the sidekiq repo.
I noticed that my workers can not run long jobs because they are killed every 1 minute or so. I was able to track down what's happening, and it appears that systemd is restarting sidekiq, even though it is successfully started. It appears that it never receives the message that the service started successfully, so systemd is killing the process.
Here are the logs:
sidekiq: 2021-06-01T23:30:56.510Z pid=24939 tid=gir INFO: Shutting down
sidekiq: 2021-06-01T23:30:56.511Z pid=24939 tid=4jxb INFO: Scheduler exiting...
systemd: Failed to start sidekiq.
systemd: Unit sidekiq.service entered failed state.
systemd: sidekiq.service failed.
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=gir INFO: Terminating quiet workers
sidekiq: 2021-06-01T23:30:56.513Z pid=24939 tid=4jvn INFO: Scheduler exiting...
sidekiq: 2021-06-01T23:30:57.015Z pid=24939 tid=gir INFO: Pausing to allow workers to finish...
sidekiq: 2021-06-01T23:30:57.516Z pid=24939 tid=gir INFO: Bye!
systemd: sidekiq.service holdoff time over, scheduling restart.
systemd: Starting sidekiq...
sidekiq: 2021-06-01T23:30:58.991Z pid=32046 tid=fs6 INFO: Enabling systemd notification integration
sidekiq: 2021-06-01T23:31:04.475Z pid=32046 tid=fs6 INFO: Booting Sidekiq 6.1.2 with redis options {:url=>"redis://******"}
sidekiq: 2021-06-01T23:31:08.869Z pid=32046 tid=fs6 INFO: Running in ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
sidekiq: 2021-06-01T23:31:08.870Z pid=32046 tid=fs6 INFO: See LICENSE and the LGPL-3.0 for licensing details.
systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Following these messages, the sidekiq worker will successfully perform the jobs from the queue for about 1 minute before it's restarted again. This cycle continues forever.
I've tried modifying the sidekiq.service file a number of different ways, but nothing seems to do the trick. In particular, this line from the logs seems to indicate there's an issue sending the signal to the right process ID, that sidekiq correctly started up: systemd: sidekiq.service: Got notification message from PID 32046, but reception only permitted for main PID 31981
Any ideas on how I can ensure that systemd accurately knows when a job succeeds/fails to start?
Here is my current systemd/sidekiq.service file:
#
# This file tells systemd how to run Sidekiq as a 24/7 long-running daemon.
#
# Customize this file based on your bundler location, app directory, etc.
# Customize and copy this into /usr/lib/systemd/system (CentOS) or /lib/systemd/system (Ubuntu).
# Then run:
# - systemctl enable sidekiq
# - systemctl {start,stop,restart} sidekiq
#
# This file corresponds to a single Sidekiq process. Add multiple copies
# to run multiple processes (sidekiq-1, sidekiq-2, etc).
#
# Use `journalctl -u sidekiq -rn 100` to view the last 100 lines of log output.
#
[Unit]
Description=sidekiq
# start us only once the network and logging subsystems are available,
# consider adding redis-server.service if Redis is local and systemd-managed.
After=syslog.target network.target
# See these pages for lots of options:
#
# https://www.freedesktop.org/software/systemd/man/systemd.service.html
# https://www.freedesktop.org/software/systemd/man/systemd.exec.html
#
# THOSE PAGES ARE CRITICAL FOR ANY LINUX DEVOPS WORK; read them multiple
# times! systemd is a critical tool for all developers to know and understand.
#
[Service]
#
# !!!! !!!! !!!!
#
# As of v6.0.6, Sidekiq automatically supports systemd's `Type=notify` and watchdog service
# monitoring. If you are using an earlier version of Sidekiq, change this to `Type=simple`
# and remove the `WatchdogSec` line.
#
# !!!! !!!! !!!!
#
Type=simple
# If your Sidekiq process locks up, systemd's watchdog will restart it within seconds.
#WatchdogSec=10
EnvironmentFile=/opt/elasticbeanstalk/deployment/custom_env_var
WorkingDirectory=/var/app/current
# If you use rbenv:
# ExecStart=/bin/bash -lc 'exec /home/deploy/.rbenv/shims/bundle exec sidekiq -e production'
# If you use the system's ruby:
# ExecStart=/usr/local/bin/bundle exec sidekiq -e production
# If you use rvm in production without gemset and your ruby version is 2.6.5
# ExecStart=/home/deploy/.rvm/gems/ruby-2.6.5/wrappers/bundle exec sidekiq -e production
# If you use rvm in production wit gemset and your ruby version is 2.6.5
ExecStart=/bin/bash -lc 'cd /var/app/current; bundle exec sidekiq -e production -r /var/app/current -C /var/app/current/config/sidekiq.yml'
# Use `systemctl kill -s TSTP sidekiq` to quiet the Sidekiq process
# !!! Change this to your deploy user account !!!
User=root
Group=root
UMask=0002
# Greatly reduce Ruby memory fragmentation and heap usage
# https://www.mikeperham.com/2018/04/25/taming-rails-memory-bloat/
Environment=MALLOC_ARENA_MAX=2
# if we crash, restart
RestartSec=1
Restart=on-failure
# output goes to /var/log/syslog (Ubuntu) or /var/log/messages (CentOS)
StandardOutput=syslog
StandardError=syslog
# This will default to "bundler" if we don't specify it
SyslogIdentifier=sidekiq
[Install]
WantedBy=multi-user.target