0

I try to use java8 stream to parse a CSV file to a list which has 23000 elements.

csvAsList.stream().map(element -> transform(element)).collect(toList())

I look at toList() source code:

public static <T> Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>(
        (Supplier<List<T>>) ArrayList::new, List::add, (left, right) -> { left.addAll(right); return left; },
        CH_ID);
}

ArrayList::new will use the default size.

But since the application will do lots of transformations like that. I think it's better to create an arraylist with a larger given size. It could save time by not copying the whole array again and again.

Is it doable? or is it just not worth doing?

Tunaki
  • 132,869
  • 46
  • 340
  • 423
San Lee
  • 59
  • 1
  • 10
  • 1
    You're looking for `Collectors.toCollection(() -> new ArrayList<>(...))`. – Tunaki Apr 26 '16 at 20:18
  • You probably want to avoid a sized constructor for parallel streams. – shmosel Apr 26 '16 at 20:37
  • Don't care about it until you've performance problems and figure out that this is the problem. I suspect this will never happend. – Fabian Barney Apr 26 '16 at 21:50
  • Replace `List l=yourlongstreamopchain.collect(toList());` with `List l=Arrays.asList(yourlongstreamopchain.toArray(X[]::new));`. That will be more efficient for large streams, *especially* for parallel streams… [Brought up here](http://stackoverflow.com/questions/28782165/why-didnt-stream-have-a-tolist-method#comment45848538_28782165) – Holger Apr 27 '16 at 09:02
  • I think this is a problem with my thought that create a large arraylist every time may not be a good choice. This could introduce a memory issue. – San Lee Apr 27 '16 at 15:31
  • @Holger I like this one. This solution will create an array at the very first beginning. This array is shared by parallel streams. Does this mean they will fight for allocation for the spot in array? This could consume some time. – San Lee Apr 27 '16 at 15:40
  • It depends on the stream source and the chained operations whether the target locations are predictable. In the best case, each thread works on its own area, otherwise some threads may buffer, not worse than with `collect(toList())`. When you do `parallelStream().map(…).toArray(…)` on an `ArrayList`, everything is perfect. – Holger Apr 27 '16 at 17:10

0 Answers0