0

I was going through Java 8 HashMap improvements which shows search performance improvement by around 20 % using Binary tree.

At that point there was a thought how is the insert performance impacted. So I started with inserting few millions of records. Below is the code snippet and different set of results.

import java.util.HashMap;
import java.util.Map;

public class MapWriter {

public static final int MAX_KEY = 1_000_000;

private Map<Double, Double> map = new HashMap<>(MAX_KEY);


public static void main(String[] args) {
    long startTime = System.currentTimeMillis();

    MapWriter writer = new MapWriter();

    for (int i = 0; i < MAX_KEY; i++) {
        double random = Math.random();
        writer.map.put(random, random);
    }

    long timeTaken = System.currentTimeMillis() - startTime;

    System.out.println("Total Time Taken = " + timeTaken);
    System.out.println("Map Size = " + writer.map.size());
}

}

Here are different results:

  1. 10 Million Insertions

    • Java 7

      • Total Time Taken = 23145
    • Java 8

      • Total Time Taken = 64964
  2. 2 Million Insertions

    • Java 7
      • Total Time Taken = 6628
    • Java 8
      • Total Time Taken = 8312
  3. 1 Million Insertions

    • Java 7
      • Total Time Taken = 3577
    • Java 8
      • Total Time Taken = 1212

The result shows that up to 1 million insertions Java 8 performs better. But the result shows opposite behavior as you move upwards.

How do I explain this behavior??????

Update: Thank you guys for your precious feedback. I need to learn more on benchmarking. I pre-initialized Math.random part and the results were same for both java 7 and java 8. Here is the modified code. Please let me know if the code still looks smelly from benchmarking perspective.

public class MapWriter {

public static int MAX_KEY = 1_000_000;


private Map<Double, Double> map = new HashMap<>(MAX_KEY);


public static void main(String[] args) {
    MAX_KEY = Integer.parseInt(args[0]);

    Double[] keys = new Double[MAX_KEY];

    for(int i = 0; i < MAX_KEY; i++) {
        keys[i] = Math.random();
    }

    MapWriter writer = new MapWriter();
    for (int i = 0; i < 100000; i++) {

        writer.map.put(keys[i], keys[i]);
    }

    writer.map = new HashMap<>(MAX_KEY);
    long startTime = System.nanoTime();


    for (int i = 0; i < MAX_KEY; i++) {
        double random = Math.random();
        writer.map.put(random, random);
    }

    long timeTaken = System.nanoTime() - startTime;

    System.out.println("Total Time Taken = " + timeTaken / 1000000);
    System.out.println("Map Size = " + writer.map.size());
}

}
Prafull
  • 73
  • 4
  • 3
    Start here: http://stackoverflow.com/a/513259/367273 – NPE Sep 05 '14 at 05:44
  • 5
    I think this benchmarking approach is too naive. Way too many factors that you probably didn't consider. – Jan Groth Sep 05 '14 at 05:47
  • Is there some evidence that Java 8's `HashMap` uses a binary tree? It doesn't sound right. – user207421 Sep 05 '14 at 05:51
  • 2
    @EJP Yes, see [HashMap.java](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/HashMap.java) particularly the implementation note starting at line 143. Briefly, if hash collisions result in overpopulation in any bin, that bin is converted from a linked list to a binary tree. – Stuart Marks Sep 05 '14 at 06:06

2 Answers2

4

If you write your own micro-benchmark instead of using a dedicated tool like JMH you should consider some important points:

  • Use System.nanoTime(). for measuring elapsed time to be immune against time changes triggered outside the JVM.
  • Don’t use Math.random() (unless you want to benchmark Math.random()). Using a global resource like the global random generator adds thread synchronization that might affect your code.
  • Use a Random instance as a source for your random inputs, but initialize it with a constant seed to ensure that the runs you want to compare really do the same
  • run the code you want to benchmark multiple times within the JVM to ensure you don’t measure initialization overhead like class loading, first time memory allocations, interpreted code, etc.

  • Generally, it’s a good idea to run a profiler during test runs to verify that the time is really spent within the code you want to benchmark

Using the last bullet above, the main problem with your benchmark is revealed very fast. You didn’t give the JVM enough initial memory (and maybe even the maximum memory is too limited) and ran the code only one time. So you are measuring mainly impacts of the memory management here.

Note that the Java 8 HashMap requires slightly more memory and when measuring the initialization cost only, even slightly more memory requirements may have a big impact .

Giving both JVMs at least 1GB initial memory for 10 Million Insertions resulted in taking about 5 seconds on Java 8 and 6 seconds on Java 7 on my machine, even without warmup. Far away from your 20 seconds not to speak of more than a minute.

The bottom line is that you need far more runs with different environmental parameters before even formulating an assumption about the reason for the different results. When you ran the test on a 32 Bit JVM, -server and -client, a 64Bit JVM, all of them with different memory settings and get consistent results for all of them showing a particular version being faster or slower, then you might suggest that it is the version. But there might still be other reasons…

Holger
  • 285,553
  • 42
  • 434
  • 765
  • Were the changes introduced in Java 8 only meant for querying performance or all around? – Sotirios Delimanolis Sep 05 '14 at 14:37
  • @Sotirios Delimanolis: You mean the binary search tree for hash collisions? It’s for improving the lookup performance in the first place. However it made certain other attempts of the past obsolete, which may improve the overall performance. E.g. earlier versions applied an improvement algorithm on every hashcode which now has been simplified. – Holger Sep 05 '14 at 14:59
  • Yes, that's what I meant. Thank you. – Sotirios Delimanolis Sep 05 '14 at 15:04
3

I think you're also testing Math.random() and auto-boxing while you run the benchmark. You have to remove them outside the benchmark scope. Furthermore I think you have to re-run your test several times and take an average instead of a single run.

Sezin Karli
  • 2,517
  • 19
  • 24
  • 2
    I'd even exclude the results of the first run from the average, if one is to allow the runtime a grace period in which it is allowed to "wake up" properly. – Gimby Sep 05 '14 at 06:50
  • 1
    It’s rather unlikely that Java 8 `Math.random()` and auto-boxing has become that slower… – Holger Sep 05 '14 at 08:57
  • 1
    I'd also say that it is unlikely that HashMap has taken a performance hit, yet here we are. The main point to take away from it is that when you do performance tests, you need to do it correctly or not at all. That also means removing as many influencing factors as you possible can. – Gimby Sep 05 '14 at 09:05