79

Some time ago, I've blogged about a Java 8 functional way of calculating fibonacci numbers recursively, with a ConcurrentHashMap cache and the new, useful computeIfAbsent() method:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class Test {
    static Map<Integer, Integer> cache = new ConcurrentHashMap<>();

    public static void main(String[] args) {
        System.out.println(
            "f(" + 8 + ") = " + fibonacci(8));
    }

    static int fibonacci(int i) {
        if (i == 0)
            return i;

        if (i == 1)
            return 1;

        return cache.computeIfAbsent(i, (key) -> {
            System.out.println(
                "Slow calculation of " + key);

            return fibonacci(i - 2) + fibonacci(i - 1);
        });
    }
}

I chose ConcurrentHashMap because I was thinking of making this example even more sophisticated by introducing parallelism (which I didn't in the end).

Now, let's increase the number from 8 to 25 and observe what happens:

        System.out.println(
            "f(" + 25 + ") = " + fibonacci(25));

The program never halts. Inside the method, there's a loop that just runs forever:

for (Node<K,V>[] tab = table;;) {
    // ...
}

I'm using:

C:\Users\Lukas>java -version
java version "1.8.0_40-ea"
Java(TM) SE Runtime Environment (build 1.8.0_40-ea-b23)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

Matthias, a reader of that blog post also confirmed the issue (he actually found it).

This is weird. I would have expected any of the following two:

  • It works
  • It throws a ConcurrentModificationException

But just never halting? That seems dangerous. Is it a bug? Or did I misunderstand some contract?

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509

3 Answers3

63

This is of course a "feature". The ConcurrentHashMap.computeIfAbsent() Javadoc reads:

If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

The "must not" wording is a clear contract, which my algorithm violated, although not for the same concurrency reasons.

What's still interesting is that there is no ConcurrentModificationException. Instead, the program just never halts - which still is a rather dangerous bug in my opinion (i.e. infinite loops. or: anything that can possibly go wrong, does).

Note:

The HashMap.computeIfAbsent() or Map.computeIfAbsent() Javadoc don't forbid such recursive computation, which is of course ridiculous as the type of the cache is Map<Integer, Integer>, not ConcurrentHashMap<Integer, Integer>. It is very dangerous for subtypes to drastically re-define super type contracts (Set vs. SortedSet is greeting). It should thus be forbidden also in super types, to perform such recursion.

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • 3
    Good find. I'd suggest a bug report/RFE against JDK. – Axel Mar 03 '15 at 19:18
  • 4
    Done, let's see if it is accepted... I'll update with a link if it is. – Lukas Eder Mar 03 '15 at 19:30
  • 3
    It seems likely to me that this type of recursive modification of other mappings is not allowed in `ConcurrentHashMap` because of the other important part of its contract: "**The entire method invocation is performed atomically, so the function is applied at most once per key.**" It is likely that your program, by violating the "no recursive modifiction" contract, is attempting to acquire a lock it already holds, and is deadlocking with itself, not running in an infinite loop. – murgatroid99 Mar 04 '15 at 01:05
  • 3
    From the JavaDoc: `IllegalStateException - if the computation detectably attempts a recursive update to this map that would otherwise never complete` – Ben Manes Mar 04 '15 at 04:14
  • 1
    @murgatroid99: That's rather implicit. The method invocation could still be performed atomically *and* update several keys. Although that probably wouldn't be a good idea in a concurrent context, because of deadlock potential. – Lukas Eder Mar 04 '15 at 06:47
  • 1
    @Axel: The report was accepted as [JDK-8074374](https://bugs.openjdk.java.net/browse/JDK-8074374). It is probably a duplicate of [JDK-8062841](https://bugs.openjdk.java.net/browse/JDK-8062841), as documented by [Ben Manes](http://stackoverflow.com/a/28845674/521799) – Lukas Eder Mar 04 '15 at 06:54
  • 1
    @Lukas: it seems to me that if invocation is to stay atomic and still be able to update several keys than probably all segments of map would need to be locked – user62058 Mar 05 '15 at 12:03
  • @user62058: In my opinion, that should be fine. Of course, it might lead to a dead-lock, which would be equally bad as a non-terminating loop – Lukas Eder Mar 05 '15 at 12:54
  • @Lukas you suggest to use HashMap, is the put method on HashMap thread-safe? so is it guaranteed to not corrupt the map state when invoked simultanously by multiple threads? with corrupt i mean that you have a "lost update" or an exception on put. Its ok if two threads compute the same value and only one put operation of those threads really is updating the map (for example last one wins) – leozilla Oct 06 '15 at 18:58
  • @leozilla: See the original question. In this case, the algorithm is single-threaded. – Lukas Eder Oct 06 '15 at 20:05
  • 2
    @LukasEder even with `HashMap` it's not OK. https://bugs.openjdk.java.net/browse/JDK-8172951 – Piotr Findeisen May 22 '17 at 07:24
  • @LukasEder that's good, but i think your final "note" might still be understood as "... but for `HashMap` it works". Let's rephrase it: "even though `Map` javadocs do not explicitly forbid that, the function should not modify the map in any way" – Piotr Findeisen May 26 '17 at 13:39
  • 1
    @PiotrFindeisen: Do you want me to add some interrobang after the final sentence in bold? :) – Lukas Eder May 26 '17 at 19:07
60

This is fixed in JDK-8062841.

In the 2011 proposal, I identified this issue during the code review. The JavaDoc was updated and a temporary fix was added. It was removed in a further rewrite due to performance issues.

In the 2014 discussion, we explored ways to better detect and fail. Note that some of the discussion was taken offline to private email for considering the low-level changes. While not every case can be covered, the common cases will not livelock. These fixes are in Doug's repository but have not made it into a JDK release.

Ben Manes
  • 9,178
  • 3
  • 35
  • 39
  • 5
    Very interesting, thanks for sharing! My report was also accepted as [JDK-8074374](https://bugs.openjdk.java.net/browse/JDK-8074374) – Lukas Eder Mar 04 '15 at 06:50
  • 2
    @Ben sorry for offtop, but I'm really surprised of how low SO rating you have despite so significant contribution in knowledge base concerning such complicated thing as lock-free (and close to lock-free) concurrency. – Alex Salauyou Mar 03 '16 at 12:38
  • @SashaSalauyou Thanks. I often provide a quick answer in comments instead of a spending the time on a longer, full answer. Comment scores don't increase reputation, though. I probably have a less than average interest in chasing rep, though. – Ben Manes Mar 04 '16 at 00:36
  • @Ben not quite fixed: [boom](https://dl.dropbox.com/s/vr6q209dd6ifcwo/ConcurrentHashMapTest.java) – Scott Oct 03 '18 at 23:03
  • @ScottMcKinney nice catch! It looks like the same assertion could be applied to `transfer` and it was a missed case. Can you email `concurrency-interest@cs.oswego.edu` to get Doug's attention? – Ben Manes Oct 03 '18 at 23:08
  • @Ben sure, no prob. I found this after integrating caffeine into my cache backend where the cache loader can be recursive. My first attempt at fixing this was to set the cache's initial capacity to the cache size. This fixes most of the problems including the test I linked to above, however I'm afraid there are still more missed cases beyond that one. I will try to isolate them and provide narrow tests. – Scott Oct 03 '18 at 23:37
  • @ScottMcKinney An ugly but [possible workaround](https://github.com/ben-manes/caffeine/wiki/Faq#recursive-computations) in Caffeine is to use `AsyncLoadingCache` so that the computations are performed outside of the map operation (on the future), optionally with the `synchronous()` view. – Ben Manes Oct 03 '18 at 23:44
  • @Ben thanks for the suggestion. I've resorted to using a Cache instead of a LoadingCache, where my facade class wraps the Cache and the CacheLoader, its get() method delegates to Cache#getIfPresent() / put(), calling the CacheLoader when needed. _shrug_ – Scott Oct 04 '18 at 00:04
4

This is very similar to the bug. Because, if you create your cache with capacity 32, your program will work until 49. And it is interesting, that parameter sizeCtl =32 + (32 >>> 1) + 1) =49! May be the reason in resizing?

  • 1
    I don't have time to dig through the source code, but I think that you are right. We are hitting this in production, constantly. As soon as we increased the `CHM` initial capacity to a very high figure, this went away. Until we can refactor our code, this is what we will do... – Eugene Feb 09 '19 at 17:39