7

I was learning multi threading and found slow down of Object.hashCode in multi threaded environment as it is taking over twice as long to compute the default hash code running 4 threads vs 1 thread for the same number of objects.

But as per my understanding it should take a similar amount of time doing this in parallel.

You can change the number of threads. Each thread has the same amount of work to do so you'd hope that running 4 threads on a my machine which is quad core machine might take about the same time as running a single thread.

I'm seeing ~2.3 seconds for 4x but .9 s for 1x.

Is there any gap in my understanding , please help me understanding this behaviour.

public class ObjectHashCodePerformance {

private static final int THREAD_COUNT = 4;
private static final int ITERATIONS = 20000000;

public static void main(final String[] args) throws Exception {
    long start = System.currentTimeMillis();
    new ObjectHashCodePerformance().run();
    System.err.println(System.currentTimeMillis() - start);
 }

private final ExecutorService _sevice =   Executors.newFixedThreadPool(THREAD_COUNT,
        new ThreadFactory() {
            private final ThreadFactory _delegate =   Executors.defaultThreadFactory();

            @Override
            public Thread newThread(final Runnable r) {
                Thread thread = _delegate.newThread(r);
                thread.setDaemon(true);
                return thread;
            }
        });

    private void run() throws Exception {
    Callable<Void> work = new java.util.concurrent.Callable<Void>() {
        @Override
        public Void call() throws Exception {
            for (int i = 0; i < ITERATIONS; i++) {
                Object object = new Object();
                object.hashCode();
            }
            return null;
        }
    };
    @SuppressWarnings("unchecked")
    Callable<Void>[] allWork = new Callable[THREAD_COUNT];
    Arrays.fill(allWork, work);
    List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
    for (Future<Void> future : futures) {
        future.get();
    }
 }

 }

For thread count 4 Output is

~2.3 seconds

For thread count 1 Output is

~.9 seconds
T-Bag
  • 10,916
  • 3
  • 54
  • 118
  • Please share the changes you made between 1 and 4 threads – Jan Dec 16 '15 at 13:54
  • The time measurement does not necessarily tell you much here. See http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Marco13 Dec 16 '15 at 13:55
  • 1
    You're probably not measuring the right thing: GC, creation of the executors and of its threads, thread coordination, object instantiations, memory allocations, etc. etc. Anyway, the beanchmark is pretty useless, since you won't be able to change anything to Object's hashCode() implementation anyway. – JB Nizet Dec 16 '15 at 13:56
  • 3
    You're not measuring hashCode(), you're measuring the instantiation of 20 million Objects when single threaded, and 80 million Objects when running 4 threads. Move the new Object() logic out of the for loop in your Callable, then you will be measuring hashCode() – Palamino Dec 16 '15 at 13:59
  • Besides, hashCode for Object is actually implented with a native platform-specific call, so you likely won't find any performance issues there. – Davio Dec 16 '15 at 14:01

3 Answers3

7

I've created a simple JMH benchmark to test the various cases:

@Fork(1)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 10)
@Warmup(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
public class HashCodeBenchmark {
    private final Object object = new Object();

    @Benchmark
    @Threads(1)
    public void singleThread(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(2)
    public void twoThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(4)
    public void fourThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }

    @Benchmark
    @Threads(8)
    public void eightThreads(Blackhole blackhole){
        blackhole.consume(object.hashCode());
    }
}

And the results are as follows:

Benchmark                       Mode  Cnt  Score   Error  Units
HashCodeBenchmark.eightThreads  avgt   10  5.710 ± 0.087  ns/op
HashCodeBenchmark.fourThreads   avgt   10  3.603 ± 0.169  ns/op
HashCodeBenchmark.singleThread  avgt   10  3.063 ± 0.011  ns/op
HashCodeBenchmark.twoThreads    avgt   10  3.067 ± 0.034  ns/op

So we can see that as long as there are no more threads than cores, the time per hashcode remains the same.

PS: As @Tom Cools had commented - you are measuring the allocation speed and not the hashCode() speed in your test.

Svetlin Zarev
  • 14,713
  • 4
  • 53
  • 82
1

See Palamino's comment:

You're not measuring hashCode(), you're measuring the instantiation of 20 million Objects when single threaded, and 80 million Objects when running 4 threads. Move the new Object() logic out of the for loop in your Callable, then you will be measuring hashCode() – Palamino

Tom Cools
  • 1,098
  • 8
  • 23
0

Two issue I see with the code:

  1. The size of allWork [] array equal to ITERATIONS.
  2. And while iterating, in the call() method make sure that each thread gets its share of load. ITERATIONS/THREAD_COUNT.

Below is the modified version you can try:

import java.util.Arrays;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.ThreadFactory;

 public class ObjectHashCodePerformance {

private static final int THREAD_COUNT = 1;
private static final int ITERATIONS = 20000;
private final Object object = new Object();

public static void main(final String[] args) throws Exception {
    long start = System.currentTimeMillis();
    new ObjectHashCodePerformance().run();
    System.err.println(System.currentTimeMillis() - start);
 }

private final ExecutorService _sevice =   Executors.newFixedThreadPool(THREAD_COUNT,
        new ThreadFactory() {
            private final ThreadFactory _delegate =   Executors.defaultThreadFactory();

            @Override
            public Thread newThread(final Runnable r) {
                Thread thread = _delegate.newThread(r);
                thread.setDaemon(true);
                return thread;
            }
        });

    private void run() throws Exception {
    Callable<Void> work = new java.util.concurrent.Callable<Void>() {
        @Override
        public Void call() throws Exception {
            for (int i = 0; i < ITERATIONS/THREAD_COUNT; i++) {
                object.hashCode();
            }
            return null;
        }
    };
    @SuppressWarnings("unchecked")
    Callable<Void>[] allWork = new Callable[ITERATIONS];
    Arrays.fill(allWork, work);
    List<Future<Void>> futures = _sevice.invokeAll(Arrays.asList(allWork));
    System.out.println("Futures size : " + futures.size());
    for (Future<Void> future : futures) {
        future.get();
    }
 }

 }
  • 1
    in the `run()/call()` method you are still allocating objects - so you are measuring the hashcode plus the allocation speed. Your answer is flawed. – Svetlin Zarev Dec 16 '15 at 14:44