11

Javadoc says

Returns a Collector that accumulates the input elements into a new Set. There are no guarantees on the type, mutability, serializability, or thread-safety of the Set returned; if more control over the returned Set is required, use toCollection(java.util.function.Supplier).

So Collectors.toCollection(HashSet::new) seems like a good idea to avoid problems here (SO question).

My problem is, as hard as I've tried, I can not get anything else returned from toSet() than a HashSet

Here is the code I used :

public static void main(String[] args) {
    List<Integer> l = Arrays.asList(1,2,3);

    for (int i = 0 ; i++<1_000_000;){
        Class clazz = l.stream().collect(Collectors.toSet()).getClass();

        if (!clazz.equals(HashSet.class)) {
            System.out.println("Not a HashSet");
        }
    }
}

Why then, does the Javadoc state that there is no guarantee when in fact, there is...

Community
  • 1
  • 1
Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89
  • 11
    `HashSet::new` is currently hard-coded as the supplier for the returned `Set` (and it explains your result). But there is no guarantee that this won't change in a future version. – Tunaki May 05 '16 at 16:39
  • 3
    Why would they guarantee it? Not defining the behavior gives them the freedom to change it if they decide something else works better. – resueman May 05 '16 at 16:39
  • "not guaranteed" doesn't mean "randomly chosen"... – Alex Salauyou May 05 '16 at 16:43
  • 1
    @SashaSalauyou No. They might return different set implementations. OP never claim that it is a Random. – Suresh Atta May 05 '16 at 16:44
  • @Tunaki Ok I get it. Indeed I went to check the source and it is hardcoded. This is certainly my understanding of english nuances. – Yassin Hajaj May 05 '16 at 16:47
  • 2
    @SashaSalauyou The question is not code wise but more Javadoc wise. My question was more "why did they write that?". But as I said, it is my understanding of english that is to blame here. – Yassin Hajaj May 05 '16 at 16:48
  • 2
    @YassinHajaj Side note: `i++<1_000_000;` is a little weird. Sure, it's clever, but it's not really saving any more space than `i < 1_000_000; i++`, which is the form that most people tend to expect. If you're coding in a project with other people, I suggest you try to keep your loops simple and as easy to understand as possible. If you're coding for yourself, go wild. – Jeffrey May 05 '16 at 16:53
  • 1
    @Jeffrey Hi Jeffrey, thanks. Yes I was just testing the theory and wanted to write faster and make sure I was getting enough loops lol. I know it isn't recommended :). – Yassin Hajaj May 05 '16 at 16:56
  • 6
    The JavaDoc is primarily about _specification_; it says what the class or method _must_ do, and what the user can count on. A clear specification -- separate from the implementation -- benefits readers, by telling them what they can count on, and maintainers, by telling them what promises have been made to users (and therefore what flexibility they have to evolve or recreate the implementation.) Any given implementation, almost by definition, will be more specific, which is why we write separate specifications -- so you don't rely on accidental characteristics of the implementation. – Brian Goetz May 05 '16 at 18:49
  • Possible duplicate of [What does it mean to program to a interface?](http://stackoverflow.com/questions/1413543/what-does-it-mean-to-program-to-a-interface) – Raedwald May 09 '16 at 06:53
  • 2
    @Jeffrey: actually, it’s not the same, as `for(…; i++<1_000_000;)` will process the value `1_000_000` while `for(…; i < 1_000_000; i++)` will not. That’s why such unintuitive constructs should be avoided, using `for(…; i <= 1_000_000; i++)` is easier to recognize… – Holger May 09 '16 at 10:35

5 Answers5

15

The JavaDoc states there is no guarantee, but that doesn't prevent any specific implementation from always returning a specific type of set. This is just the designers saying they don't want to limit what a future implementation can do. It says nothing about what the current implementation actually does.

In other words, you have discovered implementation defined behavior (always return a HashSet), but if you count on that you may have problems in the future.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
5

The current OpenJDK's implementation (and AFAIK, Oracle's too) indeed always returns a HashSet - but there's no guarantee of that. A future release of the JDK may very well change this behavior and break you code if you somehow assume that Collectors.toSet() will return a HashSet (e.g., explicitly down-cast it).

Mureinik
  • 297,002
  • 52
  • 306
  • 350
4

The type of Set returned by Collectors::toSet is an implementation detail. You should not rely on implementation details to remain the same in future versions. Right now, they use a HashSet, but in the future they might want to use a different kind of set.

Jeffrey
  • 44,417
  • 8
  • 90
  • 141
4

Future java versions might, for example, return specialized immutable set implementations that are more efficient for reading and consume less memory than the current HashSet implementation, which is really just a wrapper around HashMap. Project valhalla may eventually result in such optimizations.

They might even choose to return different set types based on the amount of data, e.g. an empty or singleton-set if it knows in advance that only zero or one elements will be returned.

So by giving fewer guarantees than possible based on the current implementation they keep the door open for future improvements.

the8472
  • 40,999
  • 5
  • 70
  • 122
1

I think what you're looking for is this: Collectors.toCollection(LinkedHashSet::new)

Pierre-Luc
  • 21
  • 6