Hash map keys are not random

Question

It seems that Java's implementation of the HashMap always places the keys to the same bins (at least I saw that with Integer keys). I.e. the hashing is deterministic and in all runs it produces the same value.
I have heard that some languages randomize the insertions so that in which bucket a key will be stored is unpredictable for security reasons.
Why is Java the keys are always the same?

Presumably because HashMap is not optimised for security? (FWIW, I'd be interested in how bucket randomisation improves security.) — Oliver Charlesworth, May 13 '15 at 22:12
@OliverCharlesworth:I am not aware of another hashmap implemenation of java used by any web application (except only for performance reasons something diffent) — Jim, May 13 '15 at 22:14
@OliverCharlesworth:So shouldn't all hashmaps be randomized? — Jim, May 13 '15 at 22:14
@OliverCharlesworth it prevents certain classes of DoS attacks; if an attacker can give you keys that all hash to the same bucket, then they can make your code run slowly. — user253751, May 13 '15 at 22:15
I wouldn't use `HashMap` for this, it is not created for this purposes... ¿why you don't create the keys randomly? hash will be ok for your security reasons... — Jordi Castilla, May 13 '15 at 22:16
@immibis:exactly that is what I heard. But I have not seen in web applications in java using/suggesting some other alternative to avoid that. Why? — Jim, May 13 '15 at 22:17
It could be. They decided not to. It could change in the future. — Louis Wasserman, May 13 '15 at 22:21
I don't get the point.. Would just reduce performance unless you actually need security. — Bubletan, May 13 '15 at 22:22
@Bubletan:http://stackoverflow.com/questions/8669946/application-vulnerability-due-to-non-random-hash-functions — Jim, May 13 '15 at 22:30
@Jim Yeah, so unless you need security it'd be just a disadvantage. There should be a separate `SecureHashMap` or something for that. — Bubletan, May 13 '15 at 22:37
@Bubletan:Well IMHO any decent application that makes money should care about security — Jim, May 13 '15 at 22:39
@Jim: *"any decent application... should care about security"* - it's not so simple - hash maps that aren't keyed on untrusted user-provided input aren't vulnerable, e.g. if your keys are the memory addresses of your objects, or indices in one of your data structure, or well-defined names of objects your factory's prepared to create, or user ids you've generated yourself etc.. So, having a HashMap that doesn't do extra work to randomise positions is fine for lots of things, but it's nice if you can customise the hashing when needed. (C++ takes exactly that approach, FWIW). — Tony Delroy, May 14 '15 at 09:28

Tom Hawtin - tackline · Answer 1 · 2015-05-13T22:32:56.240

5

The attack of interest here is Denial of Service (DoS). An adversary chooses a set of keys that hit the same bucket. This transforms the performance of map operations from O(1) to O(n). Do this n times (to construct the map say), and we go from O(n) to O(n^2). There's also the possibility of timing attacks, but I'll conveniently ignore that.

In general, most library code will assume no action is necessary to avoid DoS. However, recently some Java implementations have used MURMUR hashing to randomise the hash function for String to avoid certain attacks. MURMUR mixes a per-process random number into the generation of the hash code such that the function is stable for the process, but is difficult (though not necessarily impossible) to figure out from the outside. More recently this has been replaced with falling back to a tree structure if there are excessive collisions and the key implements Comparable appropriately.

If you're worried about such attacks in the situation you find your code, you can use other Map implementations such as java.util.TreeMap.

edited May 13 '15 at 22:32

answered May 13 '15 at 22:30

Tom Hawtin - tackline

145,806
30
211
305

I don't understand how is `TreeMap` better for this case? Also is this kind of vulnerability trivial for a commercial level application? – Jim May 13 '15 at 22:32
1

@Jim: TreeMap has guaranteed O(log(n)) performance, no matter what input is provided, so it can't be forced into O(n) performance just by providing a specific set of worst-case values. – StriplingWarrior May 13 '15 at 22:34
A `TreeMap` will always perform common `Map` operations in `O(log n)` time, as it will rebalance if the key selection attempt to force a wonky structure. In normal operation that isn't as fast, but worst case it is much better. – Tom Hawtin - tackline May 13 '15 at 22:34
1

That said, as of current versions of Java, `HashMap` will fall back to a balanced binary search tree if there are too many collisions in a single bucket (and if the keys are comparable). So using `TreeMap` may not be necessary these days. – Louis Wasserman May 13 '15 at 22:39

score 5 · Answer 2 · answered May 13 '15 at 22:41

This is not true for Java 7, which added a unique hash seed to each HashMap instance. There's more information on the Collections Framework Enhancements in Java SE 7 page.

This mechanism was removed in Java 8 for performance, and it was replaced by an alternative that converts comparable keys (such as String) into a balanced tree to elide the DoS security problem. There's more information on the Collections Framework Enhancements in Java SE 8 page.

This is the correct answer to the question "Why...?" – leventov May 14 '15 at 12:16 — leventov, May 14 '15 at 12:16

StriplingWarrior · Answer 3 · 2015-05-13T22:26:44.007

1

In Java, the individual class is responsible for implementing a default hashCode() method (or inheriting one from the Object class), and complicated security scenarios like defeating advanced DoS attacks are not the responsibility of classes like Object, Integer, etc.

So most classes use a very simple, fast implementation that tries to ensure fairly even distribution in common cases.

If you have a case where you feel it's important to implement a custom hashing strategy, whether it's because you want to avoid hacking, or because you know your particular usage is likely to cause a lot of collisions with the default method, you can use a collection like Gnu Trove's THashMap, which allows you to provide a custom hashing strategy specific to the collection instance.

edited May 13 '15 at 22:26

answered May 13 '15 at 22:22

StriplingWarrior

151,543
27
246
315

The hashCode returns always the same number for an object, it has to. How the implementation decides how to use the hashCode so that it will or will not fall to the same bucket, that is an issue of the hash table implementation and not the object – Jim May 13 '15 at 22:25
@Jim: Agreed. hashCode should also make a reasonable effort to avoid having different values also produce the same hash value, or even different hash values that are likely to fall into the same buckets. But at some point, we need to recognize that each individual class can't be responsible for producing an infallible, unguessable hash function. – StriplingWarrior May 13 '15 at 22:33
1

Trove is surely not the single project which should be mentioned as a "custom data structures for Java", now. See http://java-performance.info/hashmap-overview-jdk-fastutil-goldman-sachs-hppc-koloboke-trove-january-2015/. All projects mentioned in this article are better than Trove from any point of view. (And they all allow to provide custom hashing strategy.) – leventov May 14 '15 at 12:08
Also the Trove project name leads to the fallacy that this is the GNU project. In fact it just uses (L?)GPL license, this is the only thing connecting the project and the GNU organization. – leventov May 14 '15 at 13:03
@leventov: Thanks for the feedback. I've never had a reason to use any of these. Trove was just the first one to come up in my Google search. – StriplingWarrior May 14 '15 at 15:10

score 0 · Answer 4 · edited May 23 '17 at 11:58

0

To complete other answers, I would mention that when HashMap's keys use built-in Object.hashCode, i. e. don't override this method, that is a pretty often case, Object's hashCode is computed using random number generator, that makes whole system behaivour less deterministic. See this question: Java Object.hashCode() - address or random()? for more details.

edited May 23 '17 at 11:58

Community

1
1

answered May 14 '15 at 12:14

leventov

14,760
11
69
98

Hash map keys are not random

4 Answers4