82

I recently raised a question in stackoverflow, then found the answer. The initial question was What mechanisms other than mutexs or garbage collection can slow my multi-threaded java program?

I discovered to my horror that HashMap has been modifed between JDK1.6 and JDK1.7. It now has a block of code that causes all threads creating HashMaps to synchronize.

The line of code in JDK1.7.0_10 is

 /**A randomizing value associated with this instance that is applied to hash code of  keys to make hash collisions harder to find.     */
transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);

Which ends up calling

 protected int next(int bits) {
    long oldseed, nextseed;
    AtomicLong seed = this.seed;
    do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
    } while (!seed.compareAndSet(oldseed, nextseed));
    return (int)(nextseed >>> (48 - bits));
 }    

Looking in other JDKs, I find this isn't present in JDK1.5.0_22, or JDK1.6.0_26.

The impact on my code is huge. It makes it so that when I run on 64 threads, I get less performance than when I run on 1 thread. A JStack shows that most of the threads are spending most of their time spinning in that loop in Random.

So I seem to have some options:

  • Rewrite my code so that I don't use HashMap, but use something similar
  • Somehow mess around with the rt.jar, and replace the hashmap inside it
  • Mess with the class path somehow, so each thread gets its own version of HashMap

Before I start down any of these paths (all look very time consuming and potentially high impact), I wondered if I have missed an obvious trick. Can any of you stack overflow people suggest which is the better path, or perhaps identify a new idea.

Thanks for the help

Community
  • 1
  • 1
Stave Escura
  • 2,058
  • 1
  • 19
  • 24
  • 2
    What requires you to create that many hashmaps? What are you trying to do? – fge Dec 23 '12 at 12:30
  • the application is doing a lot of searching. The hashmaps hold the data structure that is being searched, the results of the search, and quite a few other things. It is also about 100 man-years worth of development...so it might be hard to change to something else. The data that is being searched is schemaless so very hard to model in POJOs – Stave Escura Dec 23 '12 at 12:38
  • 3
    2 comments: 1. ConcurrentHashMap does not seem to use that - could it be an alternative? 2. This piece of code is only called on map creation. That implies that you are creating millions of hashmaps under high contention - Does that really reflect a realistic production load? – assylias Dec 23 '12 at 12:41
  • 1
    Actually ConcurrentHashMap does use that method too (in oracle jdk 1.7_10) - but apparently [openJDK 7 does not](http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/classes/java/util/concurrent/ConcurrentHashMap.java). – assylias Dec 23 '12 at 12:51
  • Does the code actually reference HashMap per se or just Map when it initializes? If the latter, would a custom Map implementation be an option? – fge Dec 23 '12 at 13:12
  • 1
    @assylias You should check the [latest version over here](http://hg.openjdk.java.net/jdk7u/jdk7u/jdk/file/cb81ee79a72d/src/share/classes/java/util/concurrent/ConcurrentHashMap.java). This one does sport such a line of code. – Marko Topolnik Dec 23 '12 at 13:17
  • 3
    @StaveEscura `AtomicLong` bets on low write-contention to work well. You have high write-contention, so you need regular exclusive locking. Write a synchronized `HashMap` factory and you'll probably see an improvement, unless **all you ever do** in these threads is map instantiation. – Marko Topolnik Dec 23 '12 at 13:22
  • 1
    What I want to accomplish is no contention among anythreads. I hear your argument about the synchronized hashmap: the mutex will stop the looping, and almost certainly be quicker. Injecting a hashmap factory into all the code would be quite time consuming. Perhaps some sort of aspect orientation could help: a cut before and a cut after. – Stave Escura Dec 23 '12 at 13:49
  • I'm pretty sure you can use AOP to place `synchronized` around the `HashMap` constructor. – Marko Topolnik Dec 23 '12 at 14:04
  • I think I can do something. I think I would need to insert a cut before in which I claim a mutex, and a cut after in which I return the mutex. I use AOP a lot in debugging, but I find it awkward to deploy. Many (well in practice only one so far) clients don't like the impact on their security model. Whether that is a real concern or not I don't know, but it stopped the sale... – Stave Escura Dec 23 '12 at 17:22
  • An upgrade to JDK 1.6 is not an option? – mcv Jan 08 '13 at 12:21
  • Would have put +100 to this question for great research if I could :) – Yuriy Nakonechnyy Apr 30 '13 at 14:09
  • Not an answer to your question, but maybe a better fix for your problem: if the maps you're using aren't large (say, rarely greater than 10-20 or so), you're probably better off not using a `HashMap` at all and instead using a `TreeMap` or a custom array-backed map that just does a linear scan over key-value pairs. `HashMap` is only faster for fairly large maps due to relatively high constant factors; believe it or not, for small maps, linear search is usually the winner. – jacobm Aug 27 '13 at 01:46

4 Answers4

56

I am the original author of the patch which appeared in 7u6, CR#7118743 : Alternative Hashing for String with Hash-based Maps‌​.

I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?

Regardless, we will probably investigate switching to ThreadLocalRandom rather than Random and possibly some variant of lazy initialization as suggested by cambecc.

EDIT 3

A fix for the bottleneck was been pushed to the JDK7 update mercurial repo:

http://hg.openjdk.java.net/jdk7u/jdk7u-dev/jdk/rev/b03bbdef3a88

The fix will be part of the upcoming 7u40 release and is already available in IcedTea 2.4 releases.

Near final test builds of 7u40 are available here:

https://jdk7.java.net/download.html

Feedback is still welcomed. Send it to http://mail.openjdk.java.net/mailman/listinfo/core-libs-dev to be sure it gets seen by the openJDK devs.

Mike Duigou
  • 871
  • 10
  • 18
  • 1
    Thanks for looking into this. Yes there really is a need to make that many maps: the application is actually quite simple, but 100,000's of people can hit it a second, and that means millions of maps can be created very quickly. I can of course rewrite it to not use maps, but that is at a very high development cost. For now the plan of using reflection to hack the Random field looks good – Stave Escura Jan 03 '13 at 15:18
  • 2
    Mike, a suggestion for a near-term fix: apart from ThreadLocalRandom (which will have its own problems with applications that mess with thread-local storage) would it not be much easier & cheaper (in terms of time, risk & testing) to stripe Hashing.Holder.SEED_MAKER into an array of (say) Random instances and use the calling thread's id to %-index into it? This should instantly relieve (though not eliminate) per-thread contention without any noticeable side effects. – Holger Hoffstätte Jan 04 '13 at 16:00
  • 10
    @mduigou Web applications that have a high request rate and use JSON are going to create a large number of HashMaps per second, since most if not all JSON libraries use HashMaps or LinkedHashMaps to deserialize JSON objects. Web applications that use JSON are widespread, and creation of HashMaps may not be controlled by the application (but by a library applications use), so I'd say that there are valid reasons to not have a bottleneck when creating HashMaps. – sbordet Jan 06 '13 at 12:03
  • 3
    @mduigou perhaps a simple alleviation is simply to check if the oldSeed is the same before calling the CAS on it. This optimisation (known as test-test and set or TTAS) may seem redundant, but can have an important performance impact under contention as the CAS is not attempted if it already knows it will fail. Failed CAS has the unfortunate side-effect of setting the MESI status of the cache line to Invalid – requiring all parties to re-retrieve the value from memory. Of course, Holger's striping of the seeds is an excellent long term fix, but even then the TTAS optimisation should be used. – Jed Wesley-Smith Jan 08 '13 at 07:38
  • 5
    Do you mean "hundreds of thousands" instead of "hundreds or thousands"? - BIG difference – Michael Neale Jan 08 '13 at 08:48
  • @sbordet While true for certain usages, it should not affect pure data-binding (JSON <-> POJO), which is what frameworks like JAX-RS do. So while relevant, it can often be avoided. – StaxMan Jan 08 '13 at 15:56
30

This looks like a "bug" you can work around. There is a property that disables the new "alternative hashing" feature:

jdk.map.althashing.threshold = -1

However, disabling alternative hashing is not sufficient because it does not turn off the generation of a random hash seed (though it really should). So even if you turn off alt hashing, you still have thread contention during hash map instantiation.

One particularly nasty way of working around this is to forcefully replace the instance of Random used for hash seed generation with your own non-synchronized version:

// Create an instance of "Random" having no thread synchronization.
Random alwaysOne = new Random() {
    @Override
    protected int next(int bits) {
        return 1;
    }
};

// Get a handle to the static final field sun.misc.Hashing.Holder.SEED_MAKER
Class<?> clazz = Class.forName("sun.misc.Hashing$Holder");
Field field = clazz.getDeclaredField("SEED_MAKER");
field.setAccessible(true);

// Convince Java the field is not final.
Field modifiers = Field.class.getDeclaredField("modifiers");
modifiers.setAccessible(true);
modifiers.setInt(field, field.getModifiers() & ~Modifier.FINAL);

// Set our custom instance of Random into the field.
field.set(null, alwaysOne);

Why is it (probably) safe to do this? Because alt hashing has been disabled, causing the random hash seeds to be ignored. So it doesn't matter that our instance of Random isn't in fact random. As always with nasty hacks like this, please use with caution.

(Thanks to https://stackoverflow.com/a/3301720/1899721 for the code that sets static final fields).

--- Edit ---

FWIW, the following change to HashMap would eliminate the thread contention when alt hashing is disabled:

-   transient final int hashSeed = sun.misc.Hashing.randomHashSeed(this);
+   transient final int hashSeed;

...

         useAltHashing = sun.misc.VM.isBooted() &&
                 (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD);
+        hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0;
         init();

A similar approach can be used for ConcurrentHashMap, etc.

Community
  • 1
  • 1
cambecc
  • 4,083
  • 1
  • 23
  • 24
  • 1
    Thank you. This is indeed a hack, but it solves the problem temporarily. It is certainly a better solution than any in the list I identified above. Long term I am going to have to do something with a faster HashMap anyway. This reminds me of the solution to the old ResourceBundle cache not being clearable. The code is almost identical! – Stave Escura Dec 23 '12 at 17:19
  • 1
    FYI, this alt hashing feature is described here: [Review Request CR#7118743 : Alternative Hashing for String with Hash-based Maps](http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html). It is an implementation of the murmur3 hash function. – cambecc Dec 24 '12 at 13:15
2

Assuming your usage pattern is reasonable, you'll want to use your own version of Hashmap.

That piece of code is there to make hash collisions lot harder to cause, preventing attackers to create performance problems (details) - assuming this problem is already dealt with in some other way, I don't think you'd need synchronization at all. However, irrelevant of if you use synchronization or not, it seems you would want to use your own version of Hashmap so you wouldn't depent that much on what JDK happens to provide.

So either you just normally write something similar and point to that, or override a class in JDK. To do the latter, you can override bootstrap classpath with -Xbootclasspath/p: parameter. Doing so will however "contravene the Java 2 Runtime Environment binary code license" (source).

eis
  • 51,991
  • 13
  • 150
  • 199
  • Aha. I hadn't realised that was the point of the optimisation. Very clever. My threat model for attackers doesn't have them messing with hashmaps this way, but I will remember this for the future. I agree with your point about replacing the HashMap eventually. I will probably have to thread a factory object or maybe an IOC container into every class that makes them. I think the answer given by Cambecc will get me out of the hole, while I work on a longer term solution – Stave Escura Dec 23 '12 at 17:38
2

There are lots of apps out there that create a transient HashMap per record in big data applications. This parsers and serializers, for example. Putting any synchronization into unsynchronized collections classes is a real gotcha. In my opinion, this is unacceptable and needs to be fixed ASAP. The change that was apparently introduced in 7u6, CR#7118743 should be reverted or fixed without requiring any synchronization or atomic operation.

Somehow this is reminds me of the colossal mistake of making StringBuffer and Vector and HashTable synchronized in JDK 1.1/1.2. People paid dearly for years for that mistake. No need to repeat that experience.