39

While working on a memory benchmark of some high-throughput data structures, I realized I could use an ImmutableMap with only a little refactoring.

Thinking this would be an improvement, I threw it into the mix and was surprised to discover that not only was it slower than HashMap, in a single-threaded environment it appears to be consistently slower even than ConcurrentHashMap!

You can see the full benchmark

The meat of the test is pretty simple, time how long it takes to get a large number of random strings that may exist in the map.

public static void timeAccess(Map<String,String> map) {
    Random rnd = new Random(seed);
    int foundCount = 0;

    long start = System.nanoTime();

    for(int i = 0; i < loop; i++) {
        String s = map.get(RndString.build(rnd));
        if(s != null)
            foundCount++;
    }

    long stop = System.nanoTime() - start;

    System.out.println("Found "+foundCount+" strings out of "+loop+" attempts - "+
        String.format("%.2f",100.0*foundCount/loop)+" success rate.");
    System.out.println(map.getClass().getSimpleName()+" took "+
        String.format("%.4f", stop/1_000_000_000.0)+" seconds.");
    System.out.println();
}

And running this against a HashMap, a ConcurrentHashMap, and an ImmutableMap, all containing the same values, consistently showed a dramatic slowdown when using ImmutableMap - often upwards of 15% slower. The more sparse the map (i.e. the more often map.get() returned null) the greater the disparity. Here's the result of a sample run:

Found 35312152 strings out of 100000000 attempts - 35.31 success rate.
HashMap took 29.4538 seconds.

Found 35312152 strings out of 100000000 attempts - 35.31 success rate.
ConcurrentHashMap took 32.1465 seconds.

Found 35312152 strings out of 100000000 attempts - 35.31 success rate.
RegularImmutableMap took 37.9709 seconds.

Is this a documented / expected issue? The Guava Docs indicate Immutable*** is more memory efficient, but says nothing about speed. For slowdowns of this magnitude, I'm inclined to deal with the memory costs and avoid Immutable*** when speed is an issue (and when isn't it?!). Am I missing something?

See also: https://groups.google.com/forum/?fromgroups=#!topic/guava-discuss/I7yPpa5Hlpg

dimo414
  • 47,227
  • 18
  • 148
  • 244
  • 11
    The issues mentioned in that mailing list thread definitely still apply to your benchmarks. Additionally, see the `ImmutableMap` Javadoc: "unlike HashMap, ImmutableMap is not optimized for element types that have slow Object.equals(java.lang.Object) or Object.hashCode() implementations. You can get better performance by having your element type cache its own hash codes, and by making use of the cached values to short-circuit a slow equals algorithm." That is certainly an issue with `String`. Finally, the `ImmutableMap` implementation is largely the same as `HashMap`'s. – Louis Wasserman Apr 02 '13 at 06:40
  • If you like, you can experiment with some of Guava's benchmarks: https://code.google.com/p/guava-libraries/source/browse/guava-tests/benchmark/com/google/common/collect/MapBenchmark.java – Louis Wasserman Apr 02 '13 at 06:43
  • Also, "dealing with the memory costs" may result in significantly increased GC load, which can slow down your program just as much as a slower but more compact implementation. There's really no substitute for profiling with your specific real-world application. – Louis Wasserman Apr 02 '13 at 06:45
  • @LouisWasserman #1 - `String` has inefficient `equals()` and `hashCode()` methods? That seems like a huge issue - it's the primary type used as a map key... My impression looking at the code was that `ImmutableMap` is largely similar to `HashMap` like you said, but the access times I'm seeing don't agree. #2 thanks, I'll take a look at that. – dimo414 Apr 02 '13 at 12:48
  • @LouisWasserman #3 - could you elaborate on that? In my test (and my use case) once the maps are constructed, they're not re-sized or mucked with at all, why would a ~constant additional memory overhead (assuming enough heap headroom that the single map isn't the primary occupant of the heap) cause GC issues? I generally construct the actual application then try to whittle down interesting behavior into a repeatable benchmark, which is what I did here. I noticed an unexpected amount of slowdown when using `ImmutableMap` in a real application, and made the above test to confirm. – dimo414 Apr 02 '13 at 12:52
  • 3
    #1: `String` has a linear-time `equals` method -- I wouldn't call that a "huge issue," it's just the laws of algorithms -- and it doesn't use its `hashCode` cache to short-circuit, which is...not ideal, but arguable either way. #3: granted, that's true for your use case, but ImmutableMap is optimized to be a generalist, to be good-if-not-perfect at many different use cases. (For example, Android applications benefit significantly from ImmutableMap's compact design.) – Louis Wasserman Apr 02 '13 at 15:21
  • If you would like to improve String.equals() performance intern all the Strings returned by RndString.build(rnd) with .intern(). All of the Maps should perform better after this operation. – Nándor Krácser Apr 10 '13 at 13:04
  • Since the operator == shortcut in String.equals() works only for interned Strings and String returned by your builder aren't interned. – Nándor Krácser Apr 10 '13 at 13:47
  • That wouldn't have any bearing on this benchmark, however, since both `HashMap` and `ImmutableMap` have to deal with the same `!=` strings. – dimo414 Apr 10 '13 at 15:08
  • I see some issues with your benchmarking code: * Random should never be used in benchmarking code (nor unit testing) * Don't shuffle the order. Use something like caliper to know that the order doesn't matter. * Don't micro-benchmark by using the very naive System.nanotime() – jontejj Apr 26 '13 at 13:21
  • I'll grant the benchmark's not perfect (I see now that I didn't seed my `Random` instances the way I'd intended, will fix) but this difference is measurable in production code, and seems significant enough in this benchmark to not be caused merely by bad state. – dimo414 Apr 26 '13 at 14:11
  • Looking again, I'm fairly comfortable with how the `Random` instances are implemented for this test - to more explicitly seed `Random` would run the risk that the arbitrary seed was impacting the test. Given that it's repeatable over multiple runs with different seeds, I think that's more insightful than avoiding random behavior. – dimo414 Apr 26 '13 at 14:45
  • @LouisWasserman Could talk more about how Android benefits from ImmutableMap? – hakunami Sep 05 '15 at 14:36
  • Contrary to above, I measured performance as just as fast as HashMap **when I only search for keys that exist in the map.** His example searches for existing keys only 35% of the time, and with a memory-efficient packed hash map, it's going to have to test many subsequent entries until it comes to a NULL to know for sure that the sought key is not present. That's why he's seeing slower performance. What's your situation? – George Forman Mar 04 '22 at 01:51

2 Answers2

21

As Louis Wasserman said, ImmutableMap is not optimized for objects with slow equals method. I think the main difference is here:

HashMap:

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
    return e.value;

ImmtubleMap:

if (key.equals(candidateKey)) {
    return entry.getValue();

As you can see, to check for collisions, HashMap first check the hashes. This allows to reject values with different hashes fast. Since String doesn't make this check in its equals method, this makes HashMap faster. ImmutableMap doesn't use this optimization because it would make the test slower when equals is already optimized.

WilQu
  • 7,131
  • 6
  • 30
  • 38
  • Ah, that makes more sense, I understand what he was saying. Fascinating that String caches the hash value in `String.hashCode()` but doesn't take advantage of that data in `String.equals()`... – dimo414 Apr 30 '13 at 16:55
  • 2
    This definitely seems to be a large part of the issue, adding a `Holder` wrapper to the key that caches the object's hash code dropped the ImmutableMap's time from ~`35` seconds to ~`32`. Yet `HashMap` still performed measurably faster at ~`28` seconds. It seems Guava's assumption that keys will/should be well-behaved is a pretty poor one if `String`, vastly the most common map key, doesn't fit that assumption. – dimo414 Apr 30 '13 at 18:59
0

Some possible reasons:

  1. Could that depend on your implementation of RndString.build()?

  2. And have a look at the get() implementation of both maps: com.google.common.collect.RegularImmutableMap.get(Object) java.util.HashMap.getEntry(Object) java.util.HashMap tries to compare with "==" first. RegularImmutableMap doesn't. That may speed up

  3. Could a different load factor be responsible for that? Perhaps the RegularImmutableMap needs more iterations to find the correct entry.

roehrijn
  • 1,387
  • 1
  • 11
  • 20
  • 1. There's a link to the full code in my question. `RndString.build()` should use minimal memory and always uses the same seed for map access. It certainly could cause some bias, but I'm not aware of how. – dimo414 Apr 07 '13 at 02:31
  • 2. Good point, but `RegularImmutableMap.get()` says the following: "If we did [an == check], it would just make things worse for the most performance-conscious [equals methods]". Since `String.equals()` first does an `==` check, it (supposedly) should be faster than `== && .equals()` – dimo414 Apr 07 '13 at 02:37
  • 3. It's certainly possible (`HashMap` defaults to `.75` where `ImmutableMap` dynamically picks a value no larger than `1.2`) however even changing my `buildMap()` method to use `new HashMap<>(16,1.2f)` which should be *worse* than `ImmutableMap`'s implementation, `HashMap` was still notably faster, if slightly slower than the original benchmark. Also worth noting the comment in `RegularImmutableMap` that says "Closed addressing tends to perform well even with high load factors.... [T]he table is still likely to be relatively sparse (hence it misses fast) while saving space." – dimo414 Apr 07 '13 at 02:51