Performance of Concurrent Program Degrading with Increase in Threads?

Question

I have been trying to implement the below code on quad core computer and average running times with No of threads in the Executor service over 100 iterations is as follows

1 thread = 78404.95

2 threads = 174995.14

4 thread = 144230.23

But according to what I have studied 2*(no of cores) of threads should give optimal result for the program which is clearly not the case in my program which bizarrely gives best time for single thread.

Code :

  import java.util.Collections;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class TestHashSet {

    public static void main(String argv[]){
        Set<Integer> S = Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());
        S.add(1);
        S.add(2);
        S.add(3);
        S.add(4);
        S.add(5);
        long  startTime = System.nanoTime();
        ExecutorService executor = Executors.newFixedThreadPool(8);
        int Nb = 0;
        for(int i = 0;i<10;i++){
            User runnable = new User(S);
            executor.execute(runnable);

            Nb = Thread.getAllStackTraces().keySet().size();
        }
        executor.shutdown();
        try {
            executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        long endTime = System.nanoTime();
        System.out.println(0.001*(endTime-startTime)+" And "+Nb);
    }
}
class User implements Runnable{
    Set<Integer> S;
    User(Set<Integer> S){
        this.S = S;
    }
    @Override
    public void run() {
        // TODO Auto-generated method stub
        Set<Integer> t =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t.add(i+5);
        }
        S.retainAll(t);
        Set<Integer> t2 =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
        for(int i = 0;i<10;i++){
            t2.add(i);
        }
        S.addAll(t);
        /*
        ConcurrentHashSet<Integer> D = new ConcurrentHashSet<Integer>();
        for(int i=0;i<10;i++){
            D.add(i+3);
        }
        S.difference(D);
        */
    }
}

Update : If I increase no of queries per thread to 1000 , 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks

Threads also have a lot of overhead. I've heard of 2*NOfCores, but I've gotten optimum performance with like NOfCores + 1. — Carcigenicate, May 02 '15 at 23:04
@Carcigenicate But I got an average time around 1200 mS for single thread and 1700mS for 5 threads over about 100 Iterations.But 5 Threads Supposed to increase the performance..? — teddybear, May 02 '15 at 23:15
So you're not getting an increase at all with 2 threads? Sorry, I'm a little confused about your times, because above you have 1 thread taking 3569.707 (milliseconds I'm guessing). — Carcigenicate, May 02 '15 at 23:18
@Carcigenicate When I ran it for one as in the above code I get time for one thread around 4000 and 2 threads 4500 but when I run loop around the whole code for 100 times and see the average it is Like one thread-1200 and 2 threads - 1400. — teddybear, May 02 '15 at 23:24
That's weird, sorry. You should definitely get an increase by adding a second thread; I often get an almost 2x increase with the second thread, but obviously it's dependent on a lot of things. I don't really know, sorry. Last time I was in a situation like you, it turned out that the tasks weren't actually being parallelized. Read over the documentation to make sure you don't need to initialize something to have it run non-sequential. — Carcigenicate, May 02 '15 at 23:30
Update : If I increase no of queries per thread to 1000 , 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks — teddybear, May 03 '15 at 10:33
I had the same problem but with respect to IO; this post may help you: http://stackoverflow.com/questions/16571381/degrading-performance-when-increasing-number-of-cores — adhg, May 03 '15 at 16:38

Stephen C · Answer 1 · 2015-05-05T22:45:13.137

But 5 Threads Supposed to increase the performance..?

That's what >>you<< suppose. But in fact, there are no guarantees that adding threads will increase performance.

But according to what I have studied 2*(no of cores) of threads should give optimal result ...

If you read that somewhere, then you either misread it or it is plain wrong.

The reality is that the number of threads for optimal performance is highly dependent on the nature of your application, and also on the hardware you are running on.

Based on a cursory reading of your code, it appears that this is a benchmark to test how well Java deals with multi-threaded access and updates to a shared set (S). Each thread is doing some operations on a thread-confined set, then either adding or removing all entries in the thread-confined set to the shared set.

The problem is that the addAll and retainAll calls are likely to be concurrency bottlenecks. A set based on ConcurrentHashMap will give better concurrent performance for point access / update to the set than on based on HashMap. However, addAll and retainAll perform N such operations, on the same entries that the other threads are operating on. Given the nature of this pattern of operations, you are likely to get significant contention within the different regions of the ConcurrentHashMap. That is likely to lead to one thread blocking another ... and a slowdown.

Update : If I increase no of queries per thread 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.

I assume that you mean that you are increasing the number of hash map entries. This is likely to reduce the average contention, given the way that ConcurrentHashMap works. (The class divides the map into regions, and arranges that operations involving entries in different regions incur the minimum possible contention overheads. By increasing the number of distinct entries, you are reducing the probability that two simultaneous operations will lead to contention.)

So returning to the "2 x no of threads" factoid.

I suspect that the sources you have been reading don't actually say that that gives you optimal performance. I suspect that they really say that that:

"2 x no of threads" is a good starting point ... and you need to tune it for your application / problem / hardware, and/or
don't go above "2 x no of threads" for a compute intensive task ... because it is unlikely to help.

In your example, it is most likely that the main source of the contention is in the updates to the shared set / map ... and the overheads of ensuring that they happen atomically.

You can also get contention at a lower level; i.e. contention for memory bandwidth (RAM read/write) and memory cache contention. Whether that happens will depend on the specs of the hardware you are running on ...

The final thing to note is that your benchmark is flawed in that it does not allow for various VM warmup effects ... such as JIT compilation. The fact that your 2 thread times are more than double the 1 thread times points to that issue.

There are other questionable aspects about your benchmarking:

The amount of work done by the run() method is too small.
This benchmark does not appear to be representative of a real-world use-case. Measuring speed-up in a totally fictitious (nonsense) algorithm is not going to give you any clues about how a real algorithm is likely to perform when you scale the thread count.
Running the tests on a 4 core machine means that you probably wouldn't have enough data points to draw scientifically meaningful conclusions ... assuming that the benchmark was sound.

Having said that, the 2 to 4 thread slowdown that you seem to be seeing is not unexpected ... to me.

But if I have 4 cores in my computer. When I increase no of threads from 1 to 2 I get a increase in running time .Are bottlenecks caused by addAll and retainAll responsible for this drastic degradation of performance.cause Doubling no of cores should increase performance by 2 times in absence of any bottleneck. — teddybear, May 03 '15 at 00:12
*"Are bottlenecks caused by addAll and retainAll responsible for this drastic degradation of performance."* - That is what I am proposing as the explanation. Remove those calls and see what happens ... — Stephen C, May 03 '15 at 01:15
Even if I comment out retainAll and addAll results are same but if I add Thread.sleep(10) to run method 2 threads running in about half the time of single thread. Does that mean over head is greater than running time even for update/remove operations? — teddybear, May 03 '15 at 06:55
Update : If I increase no of queries per thread 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks — teddybear, May 03 '15 at 10:32

Performance of Concurrent Program Degrading with Increase in Threads?

1 Answers1

Linked