3

I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :

for(Object : objectsToProcess){
    Go to database retrieve data.
    process
    save data
}

My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?

Rory Lester
  • 2,858
  • 11
  • 49
  • 66
  • 1
    You mean partitioning 1000000 objects in list? – SMA Dec 20 '14 at 07:51
  • yes, u have 1000000 objects in the list and you process each one individually, when the last one has been processed the program ends. would multi threading speed it up or would the speed just be the same? – Rory Lester Dec 20 '14 at 07:54
  • 1
    Yes then try splitting the data say for e.g. in 10 parts and create 10 different db connections. – SMA Dec 20 '14 at 07:55

2 Answers2

10

In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.

The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.

The two major reasons for multi-threading are:

  • Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
  • Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).

There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.

Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.

The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().

Your code thus could look like this:

final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
    service.submit(() -> {
            Go to database retrieve data.
            process
            save data
        }
    });
}

Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.

If your objectsToProcess are something which can provide a parallel stream, you could also do this:

objectsToProcess.parallelStream().forEach(object -> {
    Go to database retrieve data.
    process
    save data
});

This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.

Further reading:

Christian Hujer
  • 17,035
  • 5
  • 40
  • 47
1

Depends on where the time is spent.

If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.

If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding

for(Object : objectsToProcess){
    Go to database retrieve data.
    process
    save data
}

Where you wait for each response before issuing the next, you want to have several worker threads each performing

 Go to database retrieve data.
 process
 save data

Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:

  1. Each thread is uses some resources, it has it's own stack, its own connection to the database. You would not want 10,000 such threads.
  2. Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".

Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.

djna
  • 54,992
  • 14
  • 74
  • 117