4

Using Java since 5, today I encountered an interesting piece of code in a GitHub PR: Someone used List#forEach with an AtomicInteger in Java 8.

Now I wondered whether this could possibly be necessary - and lo and behold, consulting the Java 8 Documentation (where Iterable#forEach was introduced, as well as the lookalike stream().parallel().forEach, the latter of which obviously might be parallel) I read:

Unless otherwise specified by the implementing class, actions are performed in the order of iteration (if an iteration order is specified).

(emphasis mine)

... So, given that Stream makes a point of not implementing Iterable (as of Java 18) - should I really have to presume that an unknown implementor of Iterable might do forEach in parallel?

(If the expectation would have change based on Java version, I'd like an answer to deal with each differing version starting at 8.)

cigien
  • 57,834
  • 11
  • 73
  • 112
Zsar
  • 443
  • 3
  • 14
  • 1
    See [this answer](https://stackoverflow.com/a/34866143/5133585) and the question that it answers. `AtomicXXX` is a common technique for dealing with "Variable used in lambda expression should be final or effectively final", not just concurrency issues. Though you're right that the documentation does not disallow an implementation from calling the consumer from different threads. – Sweeper May 06 '22 at 14:08
  • Interesting observation. Today I learned something. However, looking at the PR you've quoted, it seems to be doing unnecessary processing. If it fails to find a match in the first word, it will continue to look for a match in the remaining N-1 words and return false at the end. It would be much better to use `words.stream.anyMatch` and look for a word that is not contained. It will then stop as soon as it has a result. And that way, you won't have to use `AtomicInteger`. – k314159 May 06 '22 at 14:24
  • 3
    I can’t decide whether “iteresting” is a misspelling or a clever pun. – Basil Bourque May 06 '22 at 18:04
  • @Sweeper: Pray tell, given that "Atomic[bla]" should generate different code, all the way down to machine instructions, that sounds like a horrible way to workaround the `final`. Surely an `int[1]` is the cleaner way to do it!? – Zsar May 09 '22 at 07:39
  • In the context of the PR: I'd use `words.stream().filter(...).count()`. – Johannes Kuhn May 09 '22 at 08:11
  • 1
    Please note that I provided the PR only as a showcase for some/any code that uses the idiom. I am well aware that the whole situation can be avoided there. What I am interested in - as per the `language-lawyer` tag - is whether I _may_ expect an `Iterable#forEach` to be sequential or whether I _must_ expect it to be parallel, as a _general_ rule when the Implementor of `Iterable` is not under my control. – Zsar May 09 '22 at 09:36
  • 1
    @Zsar of course, it is a horrible workaround and you’re right, `int[1]` would be cheaper, though still not cleaner. But you may look at Stackoverflow alone, how often answerers come up with it. The linked code raises a few questions anyway, e.g. why has the pattern an unnecessary capturing group? Why the `MULTILINE` flag that makes no sense at all for words? Why does it perform `Pattern.compile("\\W")` again for each word? Why not count into a local `int` variable and add the final number to the `AtomicInteger` in one single update? So the use of the atomic is not the worst thing here… – Holger May 12 '22 at 09:28

1 Answers1

3

Note that the phrase “Unless otherwise specified by the implementing class” has been removed from recent versions of the specification:

Performs the given action for each element of the Iterable until all elements have been processed or the action throws an exception. Actions are performed in the order of iteration, if that order is specified. Exceptions thrown by the action are relayed to the caller.

Since a List has a defined encounter order and performing an action in a defined order precludes concurrent execution, we can clearly say that concurrent execution is precluded explicitly in the specific case mentioned in your question.

The broader question whether Iterable’s forEach precludes concurrent execution of the specified action in general, when there’s no defined encounter order, can only be answered by resorting to the Principle of least astonishment, concluding that the specification should mention explicitly if concurrent execution was allowed under certain circumstances. As far as I can see, Java’s entire API specification adheres to this principle, so there’s no reason to assume that it doesn’t apply to this specific method.

Most notably, everyone is aware that the Stream API allows parallel processing, because it has been prominently documented. Likewise, Arrays.sort(…) does not permit a surprising parallel evaluation of the Comparator, without mentioning it explicitely, but rather, an explicit use of Arrays.parallelSort(…) is required to enable it.

The same applies even for the actual concurrent collections. E.g. when you call keySet().forEach(…) on a ConcurrentHashMap, it will not fail when there are concurrent updates, as the general weakly consistent iteration policy specifies, but also run sequentially on the caller’s thread, as every method does when not specified otherwise. You’d need a dedicated forEach method for parallel processing.


So the use of AtomicInteger is not justified. It’s a widespread bad programming style, pretending functional programming while still being stuck with thinking in loops, using forEach and modifying a variable outside the function. Since this is not possible with local variables, AtomicInteger is only used as container for an int heap variable here. Using a single element int[] array would do as well, but still isn’t recommended. Instead

  1. stay with a plain loop, or

  2. use a real functional approach, e.g.

    int result = words.stream()
        .mapToInt(word -> {
            // count matches locally and return the int
        })
        .sum();
    

    or

    long result = words.stream()
        .flatMap(word -> {
            // return a stream of matches
        })
        .count();
    
Holger
  • 285,553
  • 42
  • 434
  • 765
  • The argumentation mirrors my own understanding. I had not caught that the fatal clause was removed - this is marvellous news indeed and I think an alternative reading is now no longer sensible or can be argued "in good faith". (Going to wait for the customary 24 hours before accepting.) – Zsar May 24 '22 at 12:19
  • FWIW: Maybe the part after the `hr` should be dropped (but for the first sentence, which probably belongs above it anyway, being the conclusion of all preceeding analysis)? It is entirely correct, but dare I say, but a distraction from the actual topic. Especially considering that the example PR was since completely restructured (from my own feedback preceeding both this answer and indeed the question itself). ... I probably should have taken the time to properly extract a MVE when posting. Had hoped for a shortcut, alas, "they never are". – Zsar May 24 '22 at 12:27
  • 2
    Actually, I think this case is so common that it is useful to clear the confusion here. – Holger May 24 '22 at 13:13
  • Fair enough. That @Sweeper immediately recognised the pattern as common shocked me quite a bit. Answer is accepted now. – Zsar May 25 '22 at 09:54