Java Map got worst performance when value use hard-coded String

Question

When I try to do some performance test for Java List/Map, I find a strange performance result.

My test logic split into data prepare and map put operation, the separated data prepare process is object to calculate map operation cost correct.

I can understand the different String creation methods result into different performance result, but the strange thing was: use of one specific hard-coded String results in the worst performance.

Why this result is so.

Here is the test code, performance result is included in test case method comment

package cn.codeworks.test;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.function.Function;
import org.junit.Test;

public class TreeMapPerformanceTest {

  private static final int dataSize = 1000 * 1000;

  static class Pair {

    private final Integer key;
    private final String value;

    public Pair(Integer key, String value) {
      this.key = key;
      this.value = value;
    }

    public Integer getKey() {
      return key;
    }

    public String getValue() {
      return value;
    }

  }

  /**
   * time cost (3 times) = 196, 178, 186
   */
  @Test
  public void testPutPerformance_string_intern() {
    testPutPerformance((loc) -> new String("abc").intern());
  }

  /**
   * time cost (3 times) = 275, 317, 331
   */
  @Test
  public void testPutPerformance_string_new() {
    testPutPerformance((loc) -> new String("abc"));
  }

  /**
   * this case got bad performance
   * time cost (3 times) = 591, 565, 570
   */
  @Test
  public void testPutPerformance_string_hardcoded() {
    testPutPerformance((loc) -> "abc");
  }


  private void testPutPerformance(Function<Integer, String> stringCreateMethod) {
    // prepare data
    List<Pair> data = new ArrayList(dataSize);
    for (int i = 0; i < dataSize; i++) {
      Pair pair = new Pair(i, stringCreateMethod.apply(i));
      data.add(pair);
    }
    int size = data.size();

    // map operation
    Map<Integer, String> map = new TreeMap<>();

    long startTimeMillis = System.currentTimeMillis();

    for (int i = 0; i < size; i++) {
      Pair pair = data.get(i);
      map.put(pair.getKey(), pair.getValue());
    }

    long endTimeMillis = System.currentTimeMillis();
    System.out.println("time cost = " + (endTimeMillis - startTimeMillis));
  }

}

The benchmark is built wrong, so any results you get are completely unreliable. See https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java — Kayaman, Aug 11 '17 at 08:52
I bet the runtime of the third (slow) test is dominated by collecting the garbage from the first and second tests. As the tests differ only in the VALUES they store into the map, not the KEYS, there's no reason at all for performance differences in the main code. Map.put() doesn't even look into the value. Short strings, long strings, interned or not, different or all the same - doesn't matter. — Ralf Kleberhoff, Aug 11 '17 at 09:09
these 3 test cases should have no affect each other, either run 3 cases in one process at same time, nor run one case in one process. the third case always result into worst performance. — chad chen, Aug 11 '17 at 09:35

chad chen · Answer 1 · 2017-08-11T09:27:43.807

It is a bad benchmark as I did not follow a correct process.

after add JVM options -XX:+PrintCompilation -verbose:gc, run the test again, and here is the result for 3 test cases:

[Full GC (Ergonomics)  114779K->32543K(270848K), 0.2512151 secs]
[Full GC (Ergonomics)  141433K->35541K(340992K), 0.3268599 secs]
[Full GC (Ergonomics)  59544K->55008K(227328K), 0.9451788 secs]

there happened one time Full GC for each case, and the most slowest case cost most long GC time.

and after add JVM options -Xms512m -Xmx512m, expect to avoid Full GC during test case running, the time cost difference between cased disappeared.

Java Map got worst performance when value use hard-coded String

1 Answers1