Java multi threading performance worst as increasing thread pool size

Question

I have 40 million data in mongoDB. I am reading that data in parallel from collection, processing it and dumping into another collection.

Sample code for job initialization.

ExecutorService executor = Executors.newFixedThreadPool(10);
int count = total_number_of_records in reading collection
int pageSize = 5000;
int counter = (int) ((count%pageSize==0)?(count/pageSize):(count/pageSize+1));
for (int i = 1; i <= counter; i++) {
        Runnable worker = new FinalParallelDataProcessingStrategyOperator(mongoDatabase,vendor,version,importDate,vendorId,i,securitiesId);
        executor.execute(worker);
    }

Each thread is doing following thing

public void run() {
    try {
        List<SecurityTemp> temps = loadDataInBatch();
        populateToNewCollection(temps);
        populateToAnotherCollection(temps);
    } catch (IOException e) {
        e.printStackTrace();
    }
}

Load data is paginated by using following query

mongoDB.getCollection("reading_collection").find(whereClause).
            .skip(pagesize*(n-1)).limit(pagesize).batchSize(1000).iterator();

pagination code reference

Machine Configuration : 2 CPU with 1 core each

Parallel implementation is giving almost same performance as sequential. Stats on subset of data (319568 records)

No. of Threads   Execution Time(minutes)

   1                 16 
   3                 15
   8                 17
   10                17
   15                16
   20                12
   50                30

How to improve performance of this application?

raising number of Threads doesn't automatically increase performance and too many threads can cause overhead issues. Hard to say why you have same performance for 1 - 10 Threads, maybe your bottleneck is the db? Is is a local db? — JohnnyAW, Aug 12 '16 at 11:15
Can also be JVM configuration, if its running in an isolated environment that has only access to one core then you're not going to see much improvement either. — Gimby, Aug 12 '16 at 11:16

score 5 · Accepted Answer · answered Aug 12 '16 at 11:19

Since you are reading your input-data from a single source that part is most likely IO-bound (from the perspective of your application), so executing it in parallel will not gain you much. on the contrary - I think executing a similar query (just with different pagination) in parrallel on multiple threads will have a negative performance-impact: the same work has to be done multiple times on the DB and the parallel queries might get into each others way.

Another question is, whether the processing-part takes up a significant amount of time in comparison with readinhg the input. If it doesn't using parallel processing will not help much to speed things up. If it does I suggest the following:

Get your data from the DB using a single query
Have multiple worker-threads that get the data-items from the result-set or an intermediate queue and process them. There's no need to have fixed batches, each worker just grabs the next available item once it finished processing the previous one.

As for the number of threads: the "sweet spot" for minimum processing time depends on the kind of processing. For CPU-intensive tasks without much IO-processing it will most likely be around the number of available cores - in your case 2.

score 2 · Answer 2 · edited Jan 04 '22 at 09:01

Multi threading does not improve performance with increase in number of threads.

IO bound applications won't gain much from multi threading.

It depends on lot of factors. Refer to this related SE question:

Is multithreading faster than single thread?

Even for less IO bound, CPU intensive applications, don't configure huge number of threads to improve performance.

You can change your code as :

ExecutorService executor = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors());

Or ( ForkJoinPool as below [works from jdk 1.8 release on-wards )

ExecutorService executor = Executors.newWorkStealingPool()

Executors API:

public static ExecutorService newWorkStealingPool()

Creates a work-stealing thread pool using all available processors as its target parallelism leve

Java multi threading performance worst as increasing thread pool size

2 Answers2