15

The Set interface makes no promises on whether implementations permit null elements. Each implementation is supposed to declare this in its documentation.

Collectors.toSet() promises to return an implementation of Set but explicitly makes “no guarantees on the type, mutability, serializability, or thread-safety of the Set returned”. Null-safety is not mentioned.

The current implementation of Collectors.toSet() in OpenJDK always uses HashSet, which permits null elements, but this could change in the future and other implementations may do differently.

If a Set implementation prohibits null elements, it throws NullPointerException at various times, in particular during an attempt to add(null). It would seem that if Collectors.toSet() decided to use a null-intolerant Set implementation, calling stream.collect(Collectors.toSet()) on a Stream stream would throw. The specification of collect does not list any exceptions, nor does the specification of any of the Collector methods. This could suggest that the collect call permits nulls within stream, but on the other hand it’s not clear whether this actually means much at all, as NullPointerException is an unchecked exception and doesn’t strictly have to be listed.

Is this specified more clearly anywhere else? In particular, is the following code guaranteed not to throw? Is it guaranteed to return true?

import java.util.stream.*;

class Test {
    public static boolean setContainsNull() {
        return Stream.of("A", "list", "of", null, "strings")
                     .collect(Collectors.toSet())
                     .contains(null);
    }
}

If not, then I assume we should always ensure a stream contains no nulls before using Collectors.toSet() or be ready to handle NullPointerException. (Is this exception alone enough though?) Alternatively, when this is unacceptable or hard, we can request a specific set implementation using code like Collectors.toCollection(HashSet::new).

Edit: there is an existing question that sounds superficially similar, and this question got closed as a supposed duplicate of that. However, the linked question does not address Collectors.toSet() at all. Moreover, the answers to that question form the underlying assumptions of my question. That question asks: are nulls allowed in streams? Yes. But what happens when a (perfectly allowed) stream that contains nulls gets collected via a standard collector?

Chortos-2
  • 995
  • 7
  • 20
  • 1
    It's not mentioned, but I think it would be terribly bad form if it didn't allow nulls, and didn't say anything about that. – Andy Turner Nov 09 '17 at 22:54
  • 2
    @SleimanJneidi The question you linked has no relation to the standard collectors, which is what I’m asking about. It asks whether streams can contain nulls (they do, and my question takes this for granted) and whether there is any magic that lets explicitly null-intolerant code applied to a stream that contains nulls to avoid NPE (there isn’t). `collect(Collectors.toSet())` isn’t explicitly null-intolerant code—or if you will, I’m exactly asking whether it is. – Chortos-2 Nov 09 '17 at 23:24
  • Since it doesn’t specify, I would not assume either way. If I knew null values were a possibility, I would always use `toCollection(HashSet::new)`, to be completely safe. – VGR Nov 09 '17 at 23:42

1 Answers1

7

There is a difference between deliberately unspecified behaviors, like “type, mutability, serializability, or thread-safety” and underspecified behavior, like the null support.

Whenever a behavior is underspecified, the actual behavior of the reference implementation tends to become the matter of fact that can’t be changed later, even if counteracting the original intention, due to compatibility constraints, or at least it can’t be changed without a strong reason.

Note that while the reserved right to return a truly immutable or non-serializable Set was not used, simply because no such type existed upon the Java 8 release, enforcing a non-null behavior was possible even without the existence of an adequate hash map type, just like groupingBy forbids null keys, though underspecified as well.

Note further that while the groupingBy collector deliberately rejects null keys in its implemen­tation code, toMap is a good example of how actual behavior becomes part of the contract. In Java 8, toMap allows null keys but rejects null values, simply because it invokes Map.merge which has that behavior. It seems, this wasn’t an intended behavior in the first place. Now, in Java 9, the toMap collector without a merge function doesn’t use Map.merge anymore (JDK-8040892, see also this answer), but deliberately rejects null values in the collector code, to be behavioral compatible with the previous version. Simply because it was never said that the null behavior is intentionally unspecified.

So, Collectors.toSet() (and likewise Collectors.toList()) allow null values for two major Java versions now and there’s no specification saying that you must not take this for granted, so you can be quite sure that this won’t change in the future.

Jens Bannmann
  • 4,845
  • 5
  • 49
  • 76
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 3
    +1 Btw, the new Java 10 `toUnmodifiable` collectors won't allow null values. See https://bugs.openjdk.java.net/browse/JDK-8184690 – Stefan Zobel Nov 10 '17 at 16:16