5

Referencing a previous answer to a question on SO, there is a method used called TestForNull. This was my original code before I was told I could make it more efficient:

My original code:

for (int i = 0; i < temp.length; i++) {
            if (map.containsKey(temp[i]))
                map.put(temp[i], map.get(temp[i]) + 1);
            else
                map.put(temp[i], 1);

In this snippet, I'm doing three look-ups to the map. I was told that this could be accomplished in just one lookup, so I ended up looking for an answer on SO and found the linked answer, and modified my code to look like:

My modified code:

for (int i = 0; i < temp.length; i++) {
            Integer value = map.get(temp[i]); 
            if (value != null)
                map.put(temp[i], value + 1);
            else
                map.put(temp[i], 1);
        }

Even though it seems better, it looks like two look-ups to me and not one. I was wondering if there was an implementation of this that only uses one, and if it can be done without the use of third-party libraries. If it helps I'm using a HashMap for my program.

Community
  • 1
  • 1
Dumpcats
  • 453
  • 5
  • 21
  • Hmm, maybe they meant only `get()` was a lookup? I don't see any way to shorten this further. – markspace Apr 30 '15 at 23:30
  • There is no difference between your two codes, really; or rather, you cannot tell. Maybe `.containsKey()` _for this `Map` implementation_ does a full retrieval, maybe it doesn't. After that, it's only a matter of choice. But personally I'd go with the second solution, ie your modified code. – fge Apr 30 '15 at 23:31
  • @fge Even if I reduced the amount of look-ups? If it helps the answer at all, I'm using a HashMap. – Dumpcats Apr 30 '15 at 23:38
  • What are you worried that takes a long time? Calculating the hash code or finding an empty spot once the hash code has been calculated? If the first then you can override the hashcode function to cache its results. If the second you can increase the size of the underlying array. You can also do both. – Benjy Kessler Apr 30 '15 at 23:50
  • 1
    @BenjyKessler Calling `get` and then `put` performs two separate key lookups. For a hash map this is two O(1) operations but it still involves calling `hashCode()` twice and potentially searching the matching bucket twice. For a tree map this is two O(log n) operations. The OP is correct to think that this could be improved. In practice this rarely matters, but to me this kind of question shows good instincts. Why work around a problem if you can avoid it? – John Kugelman May 01 '15 at 00:10
  • I agree, all I'm saying is that you can override your key's hashcode function to cache it's results so that it would only calculate it once regardless of the number of actual lookups performed. – Benjy Kessler May 01 '15 at 00:12
  • That assumes that the class is under your control. And if it is, that it's worth the tradeoff of adding an extra field to your class. Plus one must deal with the multithreading headache of having a mutable field in an immutable class; unless you compute the hash at construction time, in which case now you are adding overhead to every constructor call. – John Kugelman May 01 '15 at 00:18

2 Answers2

8

Java 8 has added a number of default methods to the Map interface that could help, including merge:

map.merge(temp[i], 1, v -> v + 1);

And compute:

map.compute(temp[i], (k, v) -> v == null ? 1 : v + 1);

HashMap's implementations of these methods are appropriately optimized to effectively only perform a single key lookup. (Curiously, the same cannot be said for TreeMap.)

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • The default implementation does the same thing, but I'd expect most `Map` implementations, specifically including `HashMap` and `TreeMap`, to have smarter implementations. – Louis Wasserman Apr 30 '15 at 23:48
  • Yes, of course, you're right. `Map`'s implementation can't avoid it two lookups, but `HashMap` is indeed much smarter. – John Kugelman Apr 30 '15 at 23:54
  • @LouisWasserman is correct. Both `HashMap` and `TreeMap` provide optimized implementations of compute. In the case of `TreeMap`, it inherits an optimized implementation from `AbstractMap`, which is still quite a bit better than 2 calls. http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/HashMap.java?av=f#1160 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/Map.java#1088 – Brett Okken Apr 30 '15 at 23:55
  • I apparently got confused following the links in grepcode. You are correct, there is not an optimized implementation for `TreeMap`. Which is strange, because it would be remarkably simple to do. – Brett Okken May 01 '15 at 00:06
  • I'll have to look more into lambda expressions and how they work. Is the answer you provided me one look-up or an improved version of two? I saw a previous version of this answer that suggested it wasn't too different from what I was previously doing. I think it's okay to use Java 8 given what I need to do (I appreciate it!). – Dumpcats May 01 '15 at 00:29
  • The default implementation in the `Map` interface is two lookups. `HashMap` overrides it with what I presume is a one lookup implementation. The code is linked in the answer (and in Brett's comment). – John Kugelman May 01 '15 at 04:19
0

@John Kugelman's answer is the best (as long as you can use java 8).

The first example has a worst case of 3 map calls (in the case of a value already present):

  1. containsKey
  2. get
  3. put

The second example always has exactly 2 calls (and a null check):

  1. get
  2. put

So you are basically trading containsKey for a null check.

In a HashMap, these operations are roughly constant time, assuming good hash code distribution (and that the distribution works well with the size of the HashMap). Other Map implementations (such as TreeMap) have log(n) execution time. Even in the case of HashMap, a null check will be faster than containsKey, making the second option the winner. However, you are unlikely to have a measurable difference unless you have poorly distributed hash codes (or this is the only thing your application is doing) or poor performing equals checks.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Brett Okken
  • 6,210
  • 1
  • 19
  • 25