8

This is a question about API desing. When extension methods were added in C#, IEnumerable got all the methods that enabled using lambda expression directly on all Collections.

With the advent of lambdas and default methods in Java, I would expect that Collection would implement Stream and provide default implementations for all its methods. This way, we would not need to call stream() in order to leverage the power it provides.

What is the reason the library architects opted for the less convenient approach?

Vitaliy
  • 8,044
  • 7
  • 38
  • 66
  • 6
    is a collection a stream? – UmNyobe Feb 11 '15 at 16:28
  • 1
    Yes it is. But not vice versa. And even if this spawns a philosophical controversy, it makes a lot of practical sense. As I mentioned, it worked wonderfully in C#. – Vitaliy Feb 11 '15 at 16:35
  • 5
    In my opinion, the primary difference is this: A stream is supposed to be useful for operations on the elements whereas a collection is supposed to store elements. – Tobias Feb 11 '15 at 16:41
  • @still_learning I see it that way too. – UmNyobe Feb 11 '15 at 16:43
  • 6
    [Answer from one of the library architects](http://stackoverflow.com/a/24472635/3179759) – Alex - GlassEditor.com Feb 11 '15 at 16:47
  • But a collection is also used to retrieve elements. And the retrieval itself can be tightly coupled to the internal implementation details of the collection. – Vitaliy Feb 11 '15 at 21:15

4 Answers4

11

From Maurice Naftalin's Lambda FAQ:

Why are Stream operations not defined directly on Collection?

Early drafts of the API exposed methods like filter, map, and reduce on Collection or Iterable. However, user experience with this design led to a more formal separation of the “stream” methods into their own abstraction. Reasons included:

  • Methods on Collection such as removeAll make in-place modifications, in contrast to the new methods which are more functional in nature. Mixing two different kinds of methods on the same abstraction forces the user to keep track of which are which. For example, given the declaration

    Collection strings;
    

    the two very similar-looking method calls

    strings.removeAll(s -> s.length() == 0);
    strings.filter(s -> s.length() == 0);          // not supported in the current API
    

    would have surprisingly different results; the first would remove all empty String objects from the collection, whereas the second would return a stream containing all the non-empty Strings, while having no effect on the collection.

    Instead, the current design ensures that only an explicitly-obtained stream can be filtered:

    strings.stream().filter(s.length() == 0)...;
    

    where the ellipsis represents further stream operations, ending with a terminating operation. This gives the reader a much clearer intuition about the action of filter;

  • With lazy methods added to Collection, users were confused by a perceived—but erroneous—need to reason about whether the collection was in “lazy mode” or “eager mode”. Rather than burdening Collection with new and different functionality, it is cleaner to provide a Stream view with the new functionality;

  • The more methods added to Collection, the greater the chance of name collisions with existing third-party implementations. By only adding a few methods (stream, parallel) the chance for conflict is greatly reduced;

  • A view transformation is still needed to access a parallel view; the asymmetry between the sequential and the parallel stream views was unnatural. Compare, for example

    coll.filter(...).map(...).reduce(...);
    

    with

    coll.parallel().filter(...).map(...).reduce(...);
    

    This asymmetry would be particularly obvious in the API documentation, where Collection would have many new methods to produce sequential streams, but only one to produce parallel streams, which would then have all the same methods as Collection. Factoring these into a separate interface, StreamOps say, would not help; that would still, counterintuitively, need to be implemented by both Stream and Collection;

  • A uniform treatment of views also leaves room for other additional views in the future.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • It seems a bit strange to me that this confused users. After all, C# developers are not taken aback by it (C#'s List supports both lazy and eager methods). If the java guys needed an human experiment for this, I think there is no better proof that this concept works. (and it's not that C# guys are smarter...). About collections, OK. Regarding parallel apis, I don't see anything counter intuitive here. Again, and sorry for being like a broken record, it is exactly what was done in C#. – Vitaliy Feb 11 '15 at 21:21
  • For what it's worth, I agree with you. I understand the rationale behind the `stream()` method, but I don't necessarily agree with it. The Stream API is a bit clumsier than it needs to be, with both the `stream()` method and with the whole `Collector` mechanism. (It'd be nice to have a `Stream.toList()` method.) However, I do find comfort knowing that this was a deliberate decision, at least, and not an oversight. – John Kugelman Feb 11 '15 at 21:35
1
  1. A Collection is an object model
  2. A Stream is a subject model

Collection definition in doc :

A collection represents a group of objects, known as its elements.

Stream definition in doc :

A sequence of elements supporting sequential and parallel aggregate operations

Seen this way, a stream is a specific collection. Not the way around. Thus Collection should not Implement Stream, regardless of backward compatibility.

So why doesnt Stream<T> implement Collection<T> ? Because It is another way of looking at a bunch of objects. Not as a group of elements, but by the operations you can perform on it. Thus this is why I say a Collection is an object model while a Stream is a subject model

UmNyobe
  • 22,539
  • 9
  • 61
  • 90
0

First, from the documentation of Stream:

Collections and streams, while bearing some superficial similarities, have different goals. Collections are primarily concerned with the efficient management of, and access to, their elements. By contrast, streams do not provide a means to directly access or manipulate their elements, and are instead concerned with declaratively describing their source and the computational operations which will be performed in aggregate on that source.

So you want to keep the concepts of stream and collection appart. If Collection would implement Stream every collection would be a stream, which it is conceptually not. The way it is done now, every collection can give you a stream which works on that collection, which is something different if you think about it.

Another factor that comes to mind is cohesion/coupling as well as encapsulation. If every class that implements Collection had to implement the operations of Stream as well, it would have two (kind of) different purposes and might become too long.

André Stannek
  • 7,773
  • 31
  • 52
-1

My guess would be that it was made that way to avoid breakage with existing code that implements Collection. It would be hard to provide a default implementation that worked correctly with all existing implementations.

John R
  • 2,066
  • 1
  • 11
  • 17