If you want to ensure you never miss that Monday task, my experience is that only some of the reasons your solution may fail will be caused by the code itself and confining solutions to a try/catch/retry will miss wider causes, e.g.:
- the running environment runs out of some resource (disc, memory) and the service is not alive at the time the Cron schedule tries to run. (Kubernetes will generally help minimise these cases, I grant you).
- someone choses just that special time to deploy a new container so the Cron schedule is missed
- you later evolve the service so you have more than one instance of the process so you get multiple executions
As great as Kubernetes is, once the Pod restarts, you typically lose the log files so you cannot easily know what happened and whether your important process ran.
For these cases I suggest two approaches. (One of these matches @gidds suggestions)
1. Maintain state outside the application in a trusted backing store.
The application has the @Scheduled
to run the nominated time, but also on startup to look for a nextRunAt
datetime it in the external store. If a run has been missed, then it is easy for that process and humans to know and take action. You can have Spring call a method on startup in this way:
@Bean
fun startUp() = CommandLineRunner {
...
}
Of course, the process needs to update the nextRunAt
.
2. Use a messaging system and simple scheduler
This more complex solution depends on what other infrastructure you also have in your mix. If you have a resilient Message Queuing system and with the correct use of transactional messaging, a "command" message is placed on a Queue at the run time§. One or more worker nodes subscribe to this Queue. The first to acquire the message will process it, and that worker needs to properly acknowledge the messages as being processed. If that worker does not, e.g. if the worker processing thread dies, or the whole JVM/etc dies then the Queue Manager will offer it to another subscriber after a suitable timeout (you need to manage that timeout carefully so you don't get a double-execution just because the process is still running). This approach works even if you only ever intend to have one worker... as soon as if comes back on line, the message it there for it the the process will run.
Most Queue Managers will have a management interface where you can see if there is a message waiting.
§ Of course, you still need a process to place the message on the queue at the right time. The Queue apporach gives you a very resilient process solution BUT there still a single-point of failure - the scheduler. So the design of this should the simplest technology you can get hold of which you can rationalise has very low chance of failure.
That "command" message can be just a blank message in a particular Queue; that's enough. Most Queue systems have an HTTP entry point to create a simple message, so you can imagine:
- a Kubernetes CronJob (the Kube people have made this reliable)
- that calls a shell script (easy to reason this won't fail)
- that uses curl to use HTTP to publish a message on a Queue (this too should be easy enough to be sure this won't fail)
- The Queue system won't lose your message - that's its job!