8

I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording). Let me explain what I try to achieve.

Part 1: I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.

Part two: Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated. The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.

My First idea: Have 12 threads, allow up to three jobs in parallel. BUT: that means, the cou is not fully untilized when there is only 1 job.

I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.

I apreciate your answers...

thanks in advance

pidabrow
  • 966
  • 1
  • 21
  • 47
Christian Rockrohr
  • 1,045
  • 1
  • 14
  • 29
  • 2
    One possibility might be to scale the number of workers with the number of jobs, and let the OS handle time slicing when there are more workers than cores. – NPE Jan 19 '13 at 14:29
  • Hmmm not sure if i understand. do you mean, not having single threads for the tasks? I never know how much jobs there will be in parallel. usually there is only one but in peaks there might be 100. This is why I need to have a limit to max 5 jobs. Becaus there are sually onle one jobs running at a time, I want to be a fast as possible and run the taks of the job with as much threads as possible (But only when there no other jobs with their singl tasks running) – Christian Rockrohr Jan 19 '13 at 14:40
  • You can have only three threads for three JOBs. Let each jOb thread can spawn four more threads while performing tasks inside a single job. – Kanagavelu Sugumar Jan 19 '13 at 14:46
  • Kanaga... thats what I currently have. There are 5 Threads peeking on a blocking queue for jobs. When ever one thread got a new job, it starts up to four parallel workers to work on the single tasks of this ob. When another job-thread peeks another job, it does the same. So there eicht parallel threads when there are two jobs at a time, which is absolutely perfect. BUT when there is only one job at a time, it is not consuming the full cpu power. at least two cores are unsed because of the only four threads. – Christian Rockrohr Jan 19 '13 at 14:51
  • can you estimate the time to finish a job? – Ralf H Jan 19 '13 at 14:56
  • I can only estimate the normal jobs. They usualy run for 30 to 60 seconds (Data for three month is prepared for customers) But there is this exceptional case when a job has to calculate the data for mor then 20 years and sometime for multiple customers at a time. This jobs runs up to four hours (time will increase in futur as the data is growing every day) – Christian Rockrohr Jan 19 '13 at 17:33

3 Answers3

6

One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue

public class TaskRunner {
  private static class PriorityRunnable implements Runnable,
            Comparable<PriorityRunnable> {
    private Runnable theRunnable;
    private int priority = 0;
    public PriorityRunnable(Runnable r, int priority) {
      this.theRunnable = r;
      this.priority = priority;
    }

    public int getPriority() {
      return priority;
    }

    public void run() {
      theRunnable.run();
    }

    public int compareTo(PriorityRunnable that) {
      return this.priority - that.priority;
    }
  }

  private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();

  private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
            TimeUnit.MILLISECONDS, taskQueue);

  public void runTasks(Runnable... tasks) {
    int priority = 0;
    Runnable nextTask = taskQueue.peek();
    if(nextTask instanceof PriorityRunnable) {
      priority = ((PriorityRunnable)nextTask).getPriority() + 1;
    }
    for(Runnable t : tasks) {
      exec.execute(new PriorityRunnable(t, priority));
      priority += 100;
    }
  }
}

The idea here is that when you have a new job you call

taskRunner.runTasks(jobTask1, jobTask2, jobTask3);

and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like

j1t1, j2t1, j1t2, j2t2, etc.

This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • I was also thinking about the idea. But this would not help (as you metioned) when a job utilizes the whole thread pool. All upcoming tasks would have to wait until this big one is ready. If I would know upfront, that the job is a "big one" then I could reduce the number of worker threads to not block the normal jobs from being processed. Maybe I can extend my comminucation interface to as400.jobqueue to also let me know the number of month that are covered by this job. (Would work, but is not yet the "Optimal, rocking, ass kicking solution" I am loocking for :-) ) thanks... – Christian Rockrohr Jan 19 '13 at 17:30
  • finally I went for priority idea. Not exactly as described by you. Details in the comments of second answer. – Christian Rockrohr Jan 19 '13 at 21:56
1

I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.

Also please have a look at Fork and Join concept in java 7.
References_1
References_2
References_3
References_4

Also learn about newcachedthreadpool()

Java newCachedThreadPool() versus newFixedThreadPool

Community
  • 1
  • 1
Kanagavelu Sugumar
  • 18,766
  • 20
  • 94
  • 101
1

The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.

Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.

Do you have a setup where you can test these different pipelines to see how they're performing for you?

Joshua Martell
  • 7,074
  • 2
  • 30
  • 37
  • Yes, starting 8 threads for each job is working very well so far. But as mentioned above, it is not "rocking" :-) When there really three jobs running at a time, there would 24 threads running. Not a blocker, but not optimal. The context switching would reduce my performance. (I know, it is just a pieceof kake compared to the rest of the work, but I am looking for something realy impressive and fully fitting to my problem. – Christian Rockrohr Jan 19 '13 at 17:31
  • Yes, I have a complete test environment available. I will run some unit test, measuring the performance. But first I woul like to collect some more ideas. – Christian Rockrohr Jan 19 '13 at 17:33
  • If you go with the 8 threads per job, I would limit it to 2 concurrent jobs. You should be able to estimate what the ideal thread performance should be and compare it with your measured results. – Joshua Martell Jan 19 '13 at 20:12
  • Thanks for your feedback. I implemented a function to estimate whether it is a normal job or a large job. When it is a normal one, it uses 8 thread. when it is an "unnormal" one it must not use more than 3 threads. Additionally I limited the number of parallel jobs to two. This is not what I was initially looking for, but it fits my needs quite well. I will now do some tests and measure the performance to fine tune the number of threads. – Christian Rockrohr Jan 19 '13 at 21:54