1

I have a pipeline of tasks to be done on files, each different type of task runs inside a different executor service. After initilizing each executor service I start the first task, this is guaranteed to not finish until finished processing all files, as it processes a folder either no more work is required or its submits a callable task to service2. So when the shutdown() call on first task is sucessful all files will now be being processed in task2 or a another task further down the pipleline, and so on. When we can shutdown the final service then we have finished.

Loader loader = Loader.getInstanceOf();
List<ExecutorService> services = new ArrayList<ExecutorService>();
ExecutorService es = Executors.newSingleThreadExecutor();

//Init Services
services.add(es);
services.add(task1.getService());
services.add(task2.getService());
services.add(task3.getService());
services.add(task4.getService());

//Start Loading Files
es.submit(loader);

int count = 0;
for (ExecutorService service : services)
{
    service.shutdown();
    count++;
    //Now wait for all submitted tasks to complete, for upto one day per task
    service.awaitTermination(10, TimeUnit.DAYS);
    MainWindow.logger.severe("Shutdown Task:" + count);
}

public class AnalyserService
{
    protected String threadGroup;
    public AnalyserService(String threadGroup)
    {
        this.threadGroup=threadGroup;
    }

    protected  ExecutorService      executorService;
    protected  CompletionService    completionService;

    protected void initExecutorService()
    {
        int workerSize = Runtime.getRuntime().availableProcessors();
        executorService
                = Executors.newFixedThreadPool(workerSize, new SongKongThreadFactory(threadGroup));
    }

    public ExecutorService getService()
    {
        if (executorService == null || executorService.isShutdown())
        {
            initExecutorService();
        }
        return executorService;
    }
}

So this is all working fine Except Ive got my cpu load logic incorrect. Every service uses a pool equal to the number of cpus the computer has. So if computer has 4 cpus and we have 5 services then we could have 20 threads all trying to work at the same time overloading the cpus. I think I should in this case only have 4 threads at a time.

If I limited each service to use one thread then Id only have 5 threads runningat same time, but this still isnt right because

  1. Will no longer be right if have more services or more cpus
  2. Is inefficient, as the pipleline kicks of most of the work will be done by task1 , if I limit it to one cpu it will be slower than neccessary, conversly later on most of the threads will be done by later tasks and task1 will have nothing to do.

I think what I need is for all tasks to share one executor service, and set its poolsize equal to the number of cput the computer has. But then how am I going to identify when the service has finished ?

Im using Java 7, so is there anything in new in Java 7 that may help, currently just using Java 5 concurrency features

Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • Are you sure your CPU usage theory is correct ? AFAIK, java does not directly support this. JVM depending upon the native thread count decides to use more than one CPU, even though there are available. – NiranjanBhat Oct 17 '12 at 10:47
  • Sorry I dont understand your question. My understanding was that if my executor services configured so that the total pool of thread is twenty and we only have 4 cpus that is going to perform worse than if we have it total pool of threads configured to be four threads ? – Paul Taylor Oct 17 '12 at 10:58
  • It does not work that way :) Please refer to this beautiful stack overflow QA:http://stackoverflow.com/questions/1223072/how-do-i-optimize-for-multi-core-and-multi-cpu-computers-in-java. You can also use any java profiler to check this. – NiranjanBhat Oct 17 '12 at 11:21
  • Ok that is interesting and I'll work through that. But if we simplify my example to the situation of having a single cpu machine, is it not true that if we have thread pool configured to use twenty threads that this will perform slower than if configured to a lowe number (i.e 4) because the computers cpu will have to timeslice its cpu between these 20 threads rather than 4 threads all the time (assuming we have enough work to keep all the 20 thread pools busy) – Paul Taylor Oct 17 '12 at 11:30
  • Not necessarily... It depends on what kind of task your thread does. In this case your thread is doing IO task and hence it will be slower. So more threads you have in your system, the more it is to benefit your application. Incase , say, your threads were doing only CPU related operation like some computing, in that case, having more threads will not benefit you much. – NiranjanBhat Oct 17 '12 at 12:07
  • Some tasks are very much cpu based, I dont know why you think they are all i/o based. But my question was not whether having more threadpools configured that cpus would speed things up, but whther it would slow them down. – Paul Taylor Oct 17 '12 at 12:31
  • You will have to go basically tune the system. There is obviously going to be a breaking point at which the increasing number of threads beyond which will not improve the software speed or even worse could degrade the software speed. This is what is done is software benchmarking, where your software is targetted for CPU and with a specific software configuration like cache size, thread pool size etc.. – NiranjanBhat Oct 17 '12 at 14:35
  • This brings me round to the original question, seems to me that if all in one executorservice there is no advantage configuring a larger thread pool then no of cpus availble, so I can do that but what is halting me is I still have no way of detecting when finished. – Paul Taylor Oct 17 '12 at 15:07
  • Seems that Im right http://stackoverflow.com/questions/12951112/what-is-optimum-thread-pool-size-for-simple-program-running-cpu-based-tasks-in-j – Paul Taylor Oct 18 '12 at 09:39

1 Answers1

-1

The core of your problem is: "[...] overloading the cpus." If this is the problem, just schedule the priority of your application correctly. By the way, you are more likely to increase IO load than to increase CPU load; a lot of different threads is actually a good thing :-)

However, your question is: " But then how am I going to identify when the service has finished ? " Very simple answer: submit() instead of invokeAll() and check the isDone() method of the Future object you receive.

http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ExecutorService.html#submit(java.util.concurrent.Callable)

parasietje
  • 1,529
  • 8
  • 36
  • Your right I think I/O load is the problem as constanctly switching tasks to allow everthing queued to run, im just suggesting one executorservice as a way to scheule things correctly. My prioritising is simply that we have a pipeline of tasks and I want to take advantage of all cpus to process as quick as I can without grinding the machine to a halt or having the cpu used by my programme varying to much. I already use submit, but the problem is that tasks1 submits tasks to task2 (and task3) , task2 submits tasks to task 3 and so on. But this is all hidden from the main controlling class. – Paul Taylor Oct 17 '12 at 10:51