Increasing Number Of Treads Cause The Longer Loop Duration

Question

I have a method with some logic that take ~40ms

When I am call this method more times (parallel)- each loop duration takes more time (a lot).

For example:

1 call - ~40ms (avg)

5 calls - ~63ms (avg)

10 calls - ~120ms (avg)

100 calls - ~1200ms (avg)

and so on...

Code:

int numberOfExecution = 100;
ExecutorService executor = Executors.newFixedThreadPool(numberOfExecution);
for (int i = 0; i < numberOfExecution; i++) {
    executor.execute(new Runnable() {
        @Override
        public void run() {
            MyObject obj1 = readBytesFromFile(1);
            MyObject obj2 = readBytesFromFile(2);
            MyObject obj3 = readBytesFromFile(3);
            MyObject obj4 = readBytesFromFile(4);


            final Map<Integer, MyObject> objects = new TreeMap<>();
            objects.put(0, obj1);
            objects.put(1, obj2);
            objects.put(2, obj3);
            objects.put(3, obj4);

            Foo foo = new Foo(objects);

            long startTime = System.currentTimeMillis();
            Thread.sleep(40); // sleep for 40ms to demonstrate my 40ms logic
            long endTime = System.currentTimeMillis();
            float durationMs = endTime - startTime;

            System.out.println("Foo.bar() duration: " + durationMs);
        }
    });
}

executor.shutdown();
while (!executor.isTerminated()) { }

Console:

Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 43.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 41.0
Foo.bar() duration: 48.0
Foo.bar() duration: 62.0
Foo.bar() duration: 48.0
Foo.bar() duration: 48.0
Foo.bar() duration: 48.0
Foo.bar() duration: 48.0
Foo.bar() duration: 49.0
Foo.bar() duration: 48.0
Foo.bar() duration: 49.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 41.0
Foo.bar() duration: 42.0
Foo.bar() duration: 49.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 49.0
Foo.bar() duration: 74.0
Foo.bar() duration: 40.0
Foo.bar() duration: 70.0
Foo.bar() duration: 40.0
Foo.bar() duration: 50.0
Foo.bar() duration: 43.0
Foo.bar() duration: 40.0
Foo.bar() duration: 43.0
Foo.bar() duration: 56.0
Foo.bar() duration: 63.0
Foo.bar() duration: 45.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 43.0
Foo.bar() duration: 43.0
Foo.bar() duration: 48.0
Foo.bar() duration: 41.0
Foo.bar() duration: 58.0
Foo.bar() duration: 59.0
Foo.bar() duration: 55.0
Foo.bar() duration: 42.0
Foo.bar() duration: 40.0
Foo.bar() duration: 42.0
Foo.bar() duration: 70.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 75.0
Foo.bar() duration: 40.0
Foo.bar() duration: 43.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 67.0
Foo.bar() duration: 59.0
Foo.bar() duration: 40.0
Foo.bar() duration: 57.0
Foo.bar() duration: 42.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 43.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 41.0
Foo.bar() duration: 42.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 42.0
Foo.bar() duration: 42.0
Foo.bar() duration: 41.0
Foo.bar() duration: 45.0
Foo.bar() duration: 44.0
Foo.bar() duration: 40.0
Foo.bar() duration: 42.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 40.0
Foo.bar() duration: 45.0
Foo.bar() duration: 41.0
Foo.bar() duration: 40.0
Foo.bar() duration: 44.0
Foo.bar() duration: 41.0
Foo.bar() duration: 42.0
Foo.bar() duration: 40.0
Foo.bar() duration: 42.0
Foo.bar() duration: 41.0
Foo.bar() duration: 42.0
Foo.bar() duration: 42.0
Foo.bar() duration: 43.0
Foo.bar() duration: 43.0

Run on MacBook Pro i9

What am I missing?

Not all problems (or implementations) benefit from parallelization. C.f. [Amdahl's Law](https://en.wikipedia.org/wiki/Amdahl%27s_law) — markspace, Dec 30 '19 at 16:06
Also you've removed a hell of a lot of code, so there's no way anyone could help you. Show your code. — markspace, Dec 30 '19 at 16:08
Do you mean to "bar()" code? I can't. This is internal company logic. Anyway it is a lot of code. I am trying to understand if I am missing a basic knowledge — Matan Marciano, Dec 30 '19 at 16:11
Just curious if using Executors.newCachedThreadPool() would produce any different result? — stackFan, Dec 30 '19 at 16:16
OK, I have just change foo.bar() line by Thread.sleep(40); and the results is better but I still measure some iteration with more than 40ms (I am updating the post now) — Matan Marciano, Dec 30 '19 at 16:30

score 2 · Answer 1 · answered Dec 30 '19 at 16:11

2

The number of cores in a machine is fixed. As we increase the number of threads that needs to be executed parallelly there will be context switching between Threads.

Please check https://www.javamex.com/tutorials/threads/context_switch.shtml .

Also, there can be resource contention between Threads. The maximum number of threads which can run in parallel will depend on the Machine Cores. In short, just increasing the thread count will not increase the code execution performance.

answered Dec 30 '19 at 16:11

dassum

4,727
2
25
38

Yes I know that, but the context switching may not affect my measurement because the measurement is only on foo.bar() call. – Matan Marciano Dec 30 '19 at 16:16
Then there can be resource contention. Please share the code for foo.bar(). Else cann't help. – dassum Dec 30 '19 at 16:20
I can't, This is internal company logic. So you think the only issue can be resource contention? You have more idea's? – Matan Marciano Dec 30 '19 at 16:24
OK, I have just change foo.bar() line by Thread.sleep(40); and the results is better but I still measure some iteration with more than 40ms (I am updating the post now) – Matan Marciano Dec 30 '19 at 16:30
Thread.sleep(40) will lead to add Context Switch time. Instead, put some hello world code inside foo.bar() like some print statement and check the timing. – dassum Dec 30 '19 at 16:32
1

Maybe gc time? [Can this helps?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Renato Dec 30 '19 at 17:05
1

This happens due to context switches. When you submit all 100 tasks thread scheduler now needs to manage 100 threads on a machine with lesser number of cores. So scheduler stops threads at some random place (might be as well between lines of code where you measure time) and gives CPU time to other threads switching execution contexts of those threads. – Ivan Dec 30 '19 at 20:22

Increasing Number Of Treads Cause The Longer Loop Duration

1 Answers1