I've got a bunch of repeating tasks to schedule. They query the database to find out what to do and then execute some action like statistics updates, sending emails, fetching files and importing them. Currently, there are maybe ten of them and this number it's expected to grow a lot. I'm not given any timing constraints, actually, it's my job to choose an algorithm so that nobody complains. :D
Currently, I'm using an ad-hoc combination of threads and periodically scheduled tasks like
- for the most important task, there's an own thread falling back to a short sleep when idle (from which it can be woken up, when new important work arrives).
- another important task is scheduled once per hour in its own thread
- medium importance tasks are scheduled periodically to "fill the holes", so that probably only one of them runs at any moment
- the least important tasks are all processed by a single dedicated thread
It seems to work well at the moment, but it's not future-proof and it doesn't feel right for these reasons:
- As the queue for the least important tasks may grow a lot, such tasks may be delayed indefinitely.
- Filling the holes may go wrong and there may be many tasks running at once.
- The number of tasks running at any given moment should depend on the server load. (*)
(*) It's primarily a web server and serving requests is actually the highest priority. Getting a separate server wouldn't help, as the bottleneck is usually the database. Currently, it works fine, but I'm looking for a better solution as we hope that the load grows by a factor of 100 in a year or two.
My idea is to increase the priority of a job, when it was delayed too much. For example, there are statistics running hourly and delaying them by a few hours is no big deal, but it shouldn't be a whole day and it mustn't be a whole week.
I'd be happy to replace all my AbstractExecutionThreadService
s and AbstractScheduledService
s by something working like follows:
- Start the highest priority tasks immediately, no matter what.
- Start the medium priority tasks only when the total load is "small".
- Start the lowest priority tasks only when the system is "mostly idle".
- Increase the priorities for delayed tasks using a supplied formula.
This surely sounds pretty fuzzy and getting it more precise is a part of what I'm asking. My competing goals are
- Never delay the important tasks needlessly.
- Never let too many concurrently running tasks slow down the server too much.
There are no hard deadlines and there's no need to minimize the number of threads used. I don't insist on a solution doing exactly what I described, I'm not looking for a library (nor I insist on reinventing the wheel). I don't think that a cron-like scheduler is the right solution.