4

Very often the performance bottleneck when using hash maps is the equals method. equals can be very expensive for deep data structures.

Note, that the following is about immutable hash maps. Thus, at least you will never remove a key. I think adding keys should be ok.

Unsafe get

Suppose you query a hash map, being certain it contains the queried key. Then if there is no collision for the given key, the found single entry can be returned just based on the hash hit, because it has to be the queried object.

This can avoid calling equals in get in most cases (when there is no collision).

Questions

  1. How is this concept called?
  2. Are there any hash map implementations available for Java or Scala, that support such an unsafe get operation?

Btw, I'm open for suggestions of a better subject line.

Community
  • 1
  • 1
ziggystar
  • 28,410
  • 9
  • 72
  • 124
  • And if there is a collision, what do you expect the `get` operation return? An exception maybe? – xiaofeng.li Nov 19 '12 at 10:29
  • 4
    With keys under your control you can achieve this with a regular `HashMap` and a custom key class whose `equals` just returns `true`. – Marko Topolnik Nov 19 '12 at 10:29
  • @infgeoax I think OP is looking for a garbage in, garbage out solution. – Marko Topolnik Nov 19 '12 at 10:30
  • @infgeoax Then you resolve the collision by calling `equals`. Where's the problem? The thing is, when you're not certain that the key is inside the map, then you have to check whether the single hit really is the queried object by calling `equals`. Thus normally every query induces at least one call to `equals`. – ziggystar Nov 19 '12 at 10:31
  • Heh, I thought you don't want `equals` called inside `get`, like a special version of `get`. – xiaofeng.li Nov 19 '12 at 10:34
  • But what happens when you put in several keys then you delete some of them...Assuming of course, they have the same hash code. – xiaofeng.li Nov 19 '12 at 10:35
  • @infgeoax Uh, should add that I'm thinking about immutable collections. Thank you. – ziggystar Nov 19 '12 at 10:36
  • Are you looking for something like this http://en.wikipedia.org/wiki/Perfect_hash_function – clstrfsck Nov 19 '12 at 10:39
  • @msandiford No, the hash function does not have to be perfect. I'm looking for a somehow unsafe implementation of get, which works only under the assumption that the queried key is contained. – ziggystar Nov 19 '12 at 10:40
  • See also my answer to http://stackoverflow.com/questions/7547741/scala-contains-in-mutable-and-immutable-sets/7547903#7547903 which covers the Scala side of your question. – Matthew Farwell Nov 19 '12 at 10:52

4 Answers4

7

How is this concept called?

Identity map. It won't call equals() when looking for elements and just use identity (i.e. ==).

The correct solution isn't to "fix" the map but to use only keys which use the default Object.equals():

public boolean equals( Object other ) { return this == other; }

The problem is that finding elements in this map can be problematic unless all keys are singletons. So you can't use String, for example, because Java doesn't guarantee that all string instances are interned. The same is true for Integer: Instances < -128 and > 127 will be different.

But if you use your own optimized implementation for keys, you can solve this.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • No, this question is not about interning. The approach described in the OP works without interning. With the approach described in this answer you'll never have to call (structural) equals, because object identity is the same as structural identity. – ziggystar Nov 19 '12 at 10:39
  • Note that the HashMap class (is not required to, but) does that already. `equals` is only called if the stored `key` is `!=` to the requested key. Of course, changing the `equals` method is a to-be-sure way to force that, but that might already be done by the HashMap implementation. – Stephan Nov 19 '12 at 10:44
4

Such a data structure doesn't exist to my knowledge exactly because it's unsafe - there is no way to reliably encode your particular condition.

But, alas, the Scala Collection Library allows one to implement new rich data structures very quickly with a small amount of fresh code. Here's an implementation of your request:

class ParticularHashMap[A, +B] private(buckets: Vector[List[(A, B)]]) extends Map[A, B]{

    def this() = this(Vector.fill(ParticularHashMap.BucketsNo)(List.empty))

    def get(key: A) = {
        val bucket = buckets(bucketIndex(key))

        bucket match {
            case List((_, v)) => Some(v) //YOUR SPECIAL OPTIMIZATION !
            case list => list.find(key == _._1).map(_._2)
        }
    }

    def iterator = buckets.flatten.iterator

    def -(key: A) = mkWithUpdatedBucket(key, _ filterNot (_._1 == key))

    def +[B1 >: B](kv: (A, B1)) = mkWithUpdatedBucket(kv._1, insert(kv._1, kv._2.asInstanceOf[B], _))

    //if you're wondering why it's Bucket[A, Any] and not Bucket[A, B] it's because of the covariance of B
    private def mkWithUpdatedBucket(key: A, f: Bucket[A, Any] => Bucket[A, Any]) = {
        val index = bucketIndex(key)
        val newBucket = f(buckets(index))
        (new ParticularHashMap[A, Any](buckets.updated(index, newBucket))).asInstanceOf[ParticularHashMap[A, B]]
    }

    private def insert(k: A, v: Any, bucket: List[(A, Any)]): Bucket[A, Any] = bucket match {
        case List() => List((k, v))
        case (k1, v1) :: kvs if k == k1 => (k, v) :: kvs
        case (k1, v1) :: kvs => (k1, v1) :: insert(k, v, kvs)
    }

    private def bucketIndex(key: A) = Math.abs(key.hashCode()) % ParticularHashMap.BucketsNo
}

object ParticularHashMap {
    private val BucketsNo = 256

    private type Bucket[+A, +B] = List[(A, B)]

    def apply[A, B](kvs: (A, B)*) = new ParticularHashMap[A, B] ++ kvs

}

Please note that this is a naive implementation, but you can improve it too suit your needs.

It will work as expected:

val myMap = ParticularHashMap("hello" -> 2, "world" -> 4)
println(myMap.get("hello")) >> Some(2)
println(myMap.get("world")) >> Some(4)
println(myMap.get("world1")) >> None

But, of course, beware that if you don't respect your contract, bad things will happen. Below is an especially crafted example to show the downside:

val evilMap = ParticularHashMap(1 -> 2)
println(evilMap.get(1)) >> Some(2)
println(evilMap.get(257)) >> Some(2) //BUT 257 IS NOT IN THE MAP !!!
Marius Danila
  • 10,311
  • 2
  • 34
  • 37
  • Thanks for the example. Interesting for a Scala solution would be a way to ensure the required condition at compile-time with no runtime overhead. – ziggystar Nov 20 '12 at 10:25
  • @ziggystar yes, it would be interesting. But I think that, except for simple types like enums and such it would be pretty difficult to prove this constraint – Marius Danila Nov 20 '12 at 10:28
  • You could use dependent types. The question is whether the conditions under which this approach is applicable allow for interesting use cases. – ziggystar Nov 20 '12 at 11:03
0

1) How is this concept called?

AFAIK, it doesn't have a name. People generally only give names to solutions that work. A solution that doesn't work (or that only works in very limited situations) isn't generally worthy of a name.

2) Are there any hash map implementations available for Java or Scala, that support such an unsafe get operation?

Not that I'm aware of. Once again, people tend not to write and publish libraries that only work in limited situations or that are problematic in their behaviour.


Where's the flaw in it? Why is it not working?

First, a Map.get(...) method that returned the wrong value if you gave it a key that was not recognized is fundamentally breaking the Java Map.get contract. And there could well be other broken methods; e.g. Map.containsKey, Map.put, Map.putAll and/or Map.remove.

Second, a data structure that gives the wrong answer if your assumptions are incorrect is a bad idea ... if you can avoid it.

Third, modulo that we don't know your actual use-case, there are almost certainly better ways to solve the problem. The obvious ones are:

  • change equals the key class to make it less expensive,
  • if you can't change equals, define and use a proxy key class, or
  • use a Map<Integer, List<Pair<Key,ValueClass>>> where the integers used as keys are the hashcodes for the actual keys.

(The last approach requires the dubious "always contains the key" assumption, but at least you have not broken the Map abstraction, and can therefore do proper checks when required.)

Finally, even if you do decide that your "dirty" optimization is necessary for your application, and decide to implement it yourself. I claim that my answers still stand as correct explanations as to:

  1. why there isn't a commonly accepted term for this "hack", and
  2. why there isn't an off-the-shelf implementation in the 3 main Java collection libraries (or any other that I'm currently aware of.)

You might not like it, but there isn't a better explanation.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • -1 This answer contains no valuable information, besides showing that you don't like the concept. Where's the flaw in it? Why is it not working? It's an optimization, and those are "dirty" most of the time. Using a language with a more expressive type system than Java (e.g. Scala) it might even be possible to ensure the requested condition at compile time. – ziggystar Nov 19 '12 at 12:36
  • For an example of a library providing unsafe operations: [some unsafe method in Colt matrix library](http://acs.lbl.gov/software/colt/api/cern/colt/matrix/DoubleMatrix3D.html#getQuick(int,%20int,%20int)). Providing faster more unsafe methods can be a very reasonable thing to do. – ziggystar Nov 19 '12 at 12:40
  • @ziggystar - OK, so some people are prepared to ship broken-by-design libraries. But libraries like the Sun Java libs, Apache commons, Guava collections, etc don't. Have you ever asked yourself why? – Stephen C Nov 19 '12 at 13:24
0

In this case, the only thing you need is an array, isn't it?

If you know there is no collision - this means no two keys has the same hashcode - you can just use the hashcode as the address to index the array. For example, if one of your key-object's hashcode is 1003, then the corresponding value object can be put into yourArray[1003].

If you cannot afford huge arrays, given you are okay to perform "unsafe" get operation, you can calculate the address as key.hashcode() % yourArray.length. Whenever key1 and key2 are mapped to the same slot because of the modulo operation, just overwrite the slot.

class SimpleMap<K, V> {
    private V [] data;

    public SimpleMap(int size){
        data = (V[])new Object[size];
    }

    public void put (K key, V value){
        data[key.hashCode() % data.length] = value;
    }

    public V get(K key) {
        return data[key.hashCode() % data.length];
    }
}
Xinchao
  • 2,929
  • 1
  • 24
  • 39