5

I've got a bunch of repeating tasks to schedule. They query the database to find out what to do and then execute some action like statistics updates, sending emails, fetching files and importing them. Currently, there are maybe ten of them and this number it's expected to grow a lot. I'm not given any timing constraints, actually, it's my job to choose an algorithm so that nobody complains. :D

Currently, I'm using an ad-hoc combination of threads and periodically scheduled tasks like

  • for the most important task, there's an own thread falling back to a short sleep when idle (from which it can be woken up, when new important work arrives).
  • another important task is scheduled once per hour in its own thread
  • medium importance tasks are scheduled periodically to "fill the holes", so that probably only one of them runs at any moment
  • the least important tasks are all processed by a single dedicated thread

It seems to work well at the moment, but it's not future-proof and it doesn't feel right for these reasons:

  • As the queue for the least important tasks may grow a lot, such tasks may be delayed indefinitely.
  • Filling the holes may go wrong and there may be many tasks running at once.
  • The number of tasks running at any given moment should depend on the server load. (*)

(*) It's primarily a web server and serving requests is actually the highest priority. Getting a separate server wouldn't help, as the bottleneck is usually the database. Currently, it works fine, but I'm looking for a better solution as we hope that the load grows by a factor of 100 in a year or two.

My idea is to increase the priority of a job, when it was delayed too much. For example, there are statistics running hourly and delaying them by a few hours is no big deal, but it shouldn't be a whole day and it mustn't be a whole week.

I'd be happy to replace all my AbstractExecutionThreadServices and AbstractScheduledServices by something working like follows:

  • Start the highest priority tasks immediately, no matter what.
  • Start the medium priority tasks only when the total load is "small".
  • Start the lowest priority tasks only when the system is "mostly idle".
  • Increase the priorities for delayed tasks using a supplied formula.

This surely sounds pretty fuzzy and getting it more precise is a part of what I'm asking. My competing goals are

  • Never delay the important tasks needlessly.
  • Never let too many concurrently running tasks slow down the server too much.

There are no hard deadlines and there's no need to minimize the number of threads used. I don't insist on a solution doing exactly what I described, I'm not looking for a library (nor I insist on reinventing the wheel). I don't think that a cron-like scheduler is the right solution.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • While this question is very interesting indeed, I doubt that the format is a good fit for StackOverflow. I am saying this on eye level and in no way condescendingly to a fellow user who has, judging from his reputation here, a lot of experience and should know better than asking a question which nobody can answer in a non-opionionated way despite your good (but abstract as in "no code") and intriguing problem description. – kriegaex Apr 21 '18 at 02:32
  • Having said that and trying to show goodwill, have you taken a look at [Quartz job scheduler](http://www.quartz-scheduler.org/)? On the web site you also find example, a cookbook for basic concepts and FAQ. I never used it, but it might help you with the "don't re-invent the wheel" part and might let you concentrate on getting the scheduling parameters as such right. – kriegaex Apr 21 '18 at 02:38
  • @kriegaex Sure, I did. It has tons of useful features, but they seem to be orthogonal to what I want here. The two competing goals I listed don't seem to be fulfilled by anything I could google out. `+++` Actually, the majority of my question is just the context. I guess, that's why it sounds fuzzy. **The listed goals can be made precise easily by defining a target function and then we'll get an optimization problem.** – maaartinus Apr 21 '18 at 06:32
  • You just described why the question does not fit SO. Thanks for even highlighting it in bold print. But anyway, good luck, it was just feedback, no offense meant. I am out of here. – kriegaex Apr 21 '18 at 14:41
  • @kriegaex I'm afraid, I can't follow. Is a problem that I didn't made it precise or that it's an optimization problem? Concerning the target function, I initially wrote one, but it got complicated and opened more question I didn't want dive into. And actually, any target function penalizing what I want to prevent would probably do. I guess, I'll rewrite my question soon - it had to be written as it to become clearer to myself. – maaartinus Apr 23 '18 at 01:27
  • Is it so hard to understand what I mean? The problem is that there is no [MCVE](http://stackoverflow.com/help/mcve) by which to reproduce and solve your problem and that the question will probably cause opinionated discussion rather than yield a clear solution. This question would be better for a discussion forum where during the discussion the question could be clarified and refined iteratively. SO is a Q/A site. – kriegaex Apr 23 '18 at 03:21
  • @kriegaex I see, but IMHO the MCVE only applies to some problems. I'm not asking about a bug, I don't have an *exact* optimization problem *yet*, I just have an approximate idea of what I'd like to get. Actually, I expected someone having solved a similar problem as I thought that dealing with a resource which doesn't scale up and prioritizing delayed tasks was commonplace. – maaartinus Apr 26 '18 at 00:09
  • Fine - and of course you do not need my consent anyway. It was just feedback, no more, no less. :-) – kriegaex Apr 26 '18 at 09:57
  • @kriegaex I do value your feedback (was just explaining my POV). Actually, I regret having posted the question as is. Unfortunately, I don't have the time to improve it at the moment. – maaartinus Apr 27 '18 at 01:40

2 Answers2

1

Working with the ExecutorService model, the classic solution to reordering executor tasks is to create a ThreadPoolExecutor with a PriorityBlockingQueue feeding it the tasks - as described here.

However needing to schedule the tasks as well puts a spin on it. ScheduledThreadPoolExecutor uses an internal custom BlockingQueue to feed the tasks in when the schedule is ready, but as I think you're well aware, it's not easily open to further customisation.

At a glance, DelayQueue looks like it fits the bill perfectly - it can prioritise the next Delayed element or task. And this handles a late decision by Delayed.getDelay() about whether it is ready to go.

The fly in the ointment with this plan is when you try to pass something like DelayQueue<DelayedRunnable> into the constructor of ThreadPoolExecutor. This will only accept a BlockingQueue<Runnable>, not BlockingQueue<? extends Runnable>.

One way out of this is to create a minimum implementation of BlockingQueue<Runnable> that delegates to a BlockingQueue. The basics are here:

public class BlockingDelayQueue extends AbstractQueue<Runnable>
        implements BlockingQueue<Runnable> {

    private final DelayQueue<DelayedRunnable> delayQueue;

    public BlockingDelayQueue(DelayQueue<DelayedRunnable> delayQueue) {
        this.delayQueue = delayQueue;
    }

    @Override
    public boolean isEmpty() {
        return delayQueue.isEmpty();
    }

    @Override
    public Runnable poll(long timeout, TimeUnit unit)
            throws InterruptedException {
        DelayedRunnable delayedRunnable = delayQueue.poll(timeout, unit);

        if (delayedRunnable == null)
            return null;
        return delayedRunnable.getCommand();
    }
    ...
}

The experimental version of DelayedRunnable used to prove the idea there uses a simple Priority enum that checks the 'busyness' of the executor:

LOW {
    boolean isReady(ThreadPoolExecutor executor) {
        return executor.getActiveCount() == 0;
    }
},
MEDIUM {
    boolean isReady(ThreadPoolExecutor executor) {
        return executor.getActiveCount() <= 1;
    }
},
HIGH {
    boolean isReady(ThreadPoolExecutor executor) {
        return true;
    }
};

Which DelayedRunnable.getDelay() can then check:

@Override
public long getDelay(TimeUnit unit) {
    long millis;
    if (!priority.isReady(executor))
        millis = 1000;
    else
        millis = time - System.currentTimeMillis();
    return unit.convert(millis, TimeUnit.MILLISECONDS);
}

- so long as it doesn't return <= 0 if the priority isn't ready yet.

This seemed to work well, e.g. launching a standard 2s sleep task here...

    DelayedScheduler scheduler = new DelayedScheduler();
    scheduler.schedule(task("Low 1"), 1, TimeUnit.SECONDS, Priority.LOW);
    scheduler.schedule(task("Low 2"), 2, TimeUnit.SECONDS, Priority.LOW);
    scheduler.schedule(task("Low 3"), 3, TimeUnit.SECONDS, Priority.LOW);
    scheduler.schedule(task("Medium 1"), 1, TimeUnit.SECONDS, Priority.MEDIUM);
    scheduler.schedule(task("Medium 2"), 2, TimeUnit.SECONDS, Priority.MEDIUM);
    scheduler.schedule(task("Medium 3"), 3, TimeUnit.SECONDS, Priority.MEDIUM);
    scheduler.schedule(task("High 1"), 1, TimeUnit.SECONDS, Priority.HIGH);
    scheduler.schedule(task("High 2"), 2, TimeUnit.SECONDS, Priority.HIGH);
    scheduler.schedule(task("High 3"), 3, TimeUnit.SECONDS, Priority.HIGH);

... produced about the right results:

High 1 started at 1087ms
Medium 1 started at 1087ms
High 2 started at 2087ms
  Medium 1 ended at 3087ms
  High 1 ended at 3087ms
High 3 started at 3087ms
  High 2 ended at 4088ms
Medium 2 started at 4088ms
  High 3 ended at 5088ms
Medium 3 started at 5088ms
  Medium 2 ended at 6088ms
  Medium 3 ended at 7089ms
Low 1 started at 7089ms
  Low 1 ended at 9089ms
Low 2 started at 9089ms
  Low 2 ended at 11089ms
Low 3 started at 11089ms
  Low 3 ended at 13089ms

- Medium priority tasks were allowed while there was only one High priority task running, Low while there was nothing else going.

(DelayedScheduler and the other unseen bits on GitHub).

df778899
  • 10,703
  • 1
  • 24
  • 36
  • I looks like you've solved a part of my (imprecisely formulated) problem, but you missed my wish "to increase the priority of a job, when it was delayed too much". That's surely not your fault as my question was too long and too unclear. Maybe this dynamic priority increase may be added somehow, though a `PriorityQueue` seems to preclude it. Anyway, thank you for the ideas and the code! – maaartinus Apr 27 '18 at 23:24
0

I think your pretty close to what you want, maybe a little encouragement/approval/aggreement is all that's needed

My thoughts would be "If I know the max number of concurrent threads I can run then how will I share those against 3 thread queues".

Once I know this I can setup 3 queues, each with a different share of the available threads. - Priority 1 (Highest) gets 50% of work - Priority 2 gets 35% of work - Priority 3 (Lowest) gets 15% of work

Mike Murphy
  • 1,006
  • 8
  • 16