5

The author of java 8 in action writes this class:

class ToListCollector<T> implements Collector<T, List<T>, List<T>> {

    @Override
    public Supplier<List<T>> supplier() {
        return ArrayList::new;
    }

    @Override
    public BiConsumer<List<T>, T> accumulator() {
        return List::add;
    }

    @Override
    public BinaryOperator<List<T>> combiner() {
        return (l1, l2) -> {
            l1.addAll(l2);
            return l1;
        };
    }

    @Override
    public Function<List<T>, List<T>> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(Characteristics.IDENTITY_FINISH, Characteristics.CONCURRENT));
    }
}

Then he talks about what different values in Characteristic enum mean. And then he explains why this collector he wrote is IDENTITY_FINISH and CONCURRENT and not UNORDERED, saying:

The ToListCollector developed so far is IDENTITY_FINISH, because the List used to accumulate the elements in the stream is already the expected final result and doesn’t need any further transformation, but it isn’t UNORDERED because if you apply it to an ordered stream you want this ordering to be preserved in the resulting List. Finally, it’s CONCURRENT, but following what we just said, the stream will be processed in parallel only if its underlying data source is unordered.

Why will the stream be processed in parallel only if the underlying source is unordered? I think it will be still processed in parallel but combiner() will have to preserve order. Is it an error in the book?

I think Brian Goetz quite clearly talks about parallel processing of ordered streams in this post in the last parahraph.

The pages in the book are 192 - 193.

Holger
  • 285,553
  • 42
  • 434
  • 765
Coder-Man
  • 2,391
  • 3
  • 11
  • 19
  • 2
    There is a difference between *parallel* collect and *concurrent* collect. [This answer](https://stackoverflow.com/a/41045442/2711488) explains the difference. The *concurrent* collect operation can’t maintain the order, hence can only be used if *at least one*, the stream or the collector, is *unordered*. – Holger May 31 '18 at 14:16
  • @Holger so Characteristics.CONCURRENT can't be used without Characteristics.UNORDERED? Or are you saying that Characteristics.CONCURRENT can also mean parallel collect? – Coder-Man May 31 '18 at 14:28
  • @Holger In Brian Goetz's post that I linked to, will the collector in his second code snippet be _concurrent_ but not _unordered_ meaning that it collects in parallel but not concurrently? And what is parallel collect anyway? Say we process an ArrayList in parallel and its contents are broken up into 12 chunks. Does it mean that if you parallel collect, and have 4 threads, then the first thread will combine chunks (1,2,3), second (4,5,6), third (7,8,9), fourth (10,11,12)? – Coder-Man May 31 '18 at 14:50
  • @POrekhov to understand these things, you need to try them out a bit. a `Concurrent collect` is when the supplier is called only once and all threads update the container that the supplier generates. A *parallel* collect is when each Thread will call the Supplier once and operate inside that container, later two threads merge those via the combiner. More on the subject: https://stackoverflow.com/questions/40888262/what-is-the-difference-between-collectors-toconcurrentmap-and-converting-a-map-t/40888456#40888456 – Eugene May 31 '18 at 19:30
  • @Eugene thanks, I get it now – Coder-Man Jun 01 '18 at 07:14
  • 1
    @POrekhov A collector with `Characteristics.CONCURRENT` can still be used for a *parallel* `collect`; that's why it must provide a valid combiner function, even when it wouldn't be used in a *concurrent* `collect`. The Stream decides which strategy to use. As said in [another comment](https://stackoverflow.com/questions/50625544/?noredirect=1#comment88263029_50625610), if the collector is not unordered, the stream or a combined collector may be, which would also enable using the declared `CONCURRENT` characteristic. – Holger Jun 01 '18 at 08:32
  • @Holger is it the collect function that decides according to what kind of collector was passed in whether this collect operation is sequential, concurrent or parallel? Like how does that work? It seems to me that if the collect function sees UNORDERED+CONCURRENT flags, collect is concurrent, if CONCURRENT, then it's parallel, If there's no CONCURRENT, it's sequential. Is this how it works? – Coder-Man Jun 01 '18 at 09:17
  • Also where do you guys learn streams from? The java language spec? – Coder-Man Jun 01 '18 at 09:25
  • 2
    It's the implementation of the `collect` method which does the decision, but not the way you described. Instead, it is: If the stream is sequential, it will be sequential. If the stream is parallel, it will be a parallel or concurrent collect. If at least either, the stream or the collector, is unordered and the collector has the `CONCURRENT` characteristic. it will be concurrent, otherwise it will be parallel. Note that every collector can be used in parallel, that's a fundamental property of the design. Due to the use of distinct local containers, they don't need to be thread safe. – Holger Jun 01 '18 at 09:25
  • @Holger that makes sense, thanks! – Coder-Man Jun 01 '18 at 09:30
  • 3
    Starting points are the [package documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#package.description), the [official tutorial](https://docs.oracle.com/javase/tutorial/collections/streams/index.html), and Stackoverflow. The rest is dealing with it for five year now... – Holger Jun 01 '18 at 09:35

1 Answers1

8

That is simply wrong. Even adding CONCURRENT characteristic here is wrong, as you would need a thread safe data structure in the Supplier.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • I thought that as well. ArrayList isn't even synchronized. So frustrating, the book is riddled with errors. – Coder-Man May 31 '18 at 13:34
  • 2
    Note that this can break in practice, even without an *unordered* collector characteristic. Either, when the stream is unordered, or when you use it as downstream collector of `groupingByConcurrent`. In these cases, the stream will try to utilize the declared *concurrent* nature of this collector. – Holger May 31 '18 at 14:19