0

I want to process a list and then count how much item processed. My code is something like:

List<String> names = ...;
names.stream().filter(...).map(...) /* a list of time-consuming filters and maps */
        .forEach(...);
// Then count them:
int count = names.stream().filter(...).map(...).count();

If I want to simplify my code use Supplier like this: Copy a stream to avoid "stream has already been operated upon or closed" , I still need to do a list of time-consuming filters and maps twice.

So I have to do it in an ugly way, like:

List<String> filteredNames = names.stream().filter(...).map(...).collector(Collectors.toList());
filteredNames.forEach(...);
int count = filteredNames.size();

This may generate another huge list on Java heap.

An uglier but efficient way is like:

int[] count = {0}; // or use AtomicInteger(0);
names.stream().filter(...).map(...).forEach(s -> {
    // process
    count[0] ++;
});

Is there an elegant way to do this?

auntyellow
  • 2,423
  • 2
  • 20
  • 47
  • 1
    I would just preceed a `peek` – Felix Jun 24 '20 at 07:22
  • But I found this: https://stackoverflow.com/questions/33635717/in-java-streams-is-peek-really-only-for-debugging – auntyellow Jun 24 '20 at 07:25
  • there's this: https://stackoverflow.com/questions/43653761/java-8-streams-count-all-elements-which-enter-the-terminal-operation but it suggests the approach you've already taken.. maybe an improvement is doing it inside a peek() – AminM Jun 24 '20 at 07:36
  • According to this: https://stackoverflow.com/a/33636377/4260959 , `stream().peek(...).count()` is not recommended. – auntyellow Jun 24 '20 at 08:19

3 Answers3

3

Iterating over a List is not expensive per se. The expenses of repeated Stream operations depend on the actual repeated intermediate operations. Due to the optimizing environment, the code is executed in, two single-purpose loops may turn out to be more efficient than putting two concerns into one loop.

Note that the count does not depend on the result of the map operation, hence, you can omit it for the count operation:

List<String> names = ...;
names.stream().filter(...).map(...).forEach(...);
// Then count them:
long count = names.stream().filter(...).count();

This is the variant to try and measure, for comparison with Stream approaches doing both actions at once.

Note that forEach does not guaranty your action to be performed in order and in case of a parallel stream, it could even get evaluated concurrently. Similar behavior would apply to peek, as well as any attempt to insert the actual action into a map function.

An attempt to use something like

long count = names.stream().filter(...).map(...)
    .map(x -> {                                   // don't do this
        // your action
        return x;
    }).count();

bears the additional problem that an implementation might consider the same thing as said above, the map function’s result is irrelevant to the count, so it could get skipped, even if the current implementation does not.

Likewise, using

long count = names.stream().filter(...).map(...)
    .peek(x -> {                                  // don't do this
        // your action
    }).count();

bears some risk. The last map step is irrelevant to the final count, so it could get skipped, in which case the peek action must get skipped too, as the input for it doesn’t exist. There is no guaranty that obsolete intermediate operations are kept only for the sake of peek operations. We can only speculate, how radical the implementers will exploit this aspect. But the only practical example in the current reference implementation, is the skipping of the entire pipeline when the count is predictable beforehand, which also skips all peek operations. This suggests that we should never rely on the execution of peek actions.

But when you use

int count = names.stream().filter(...).map(...)
    .mapToInt(x -> {                   // don't assume sequential in-order execution
        // your action
        return 1;
    }).sum();

the end result is dependent on the function containing the action, so it will always be evaluated. But unlike forEach, which can be replaced by forEachOrdered, to get guaranteed non-concurrent, ordered execution, there is no such option for the mapping function.

Holger
  • 285,553
  • 42
  • 434
  • 765
2

This is not particularly more elegant or efficient, but it's an alternative you didn't list (a mutable reduction):

AtomicInteger count = names.stream().collect(
        AtomicInteger::new,
        (sum, name) -> {
            //forEach action here
            sum.incrementAndGet();
        }, 
        (n1, n2) -> {
            n1.addAndGet(n2.get());
        });
ernest_k
  • 44,416
  • 5
  • 53
  • 99
0

If you have to process all the elements of the Stream anyway (and it seems like you do, since you are using forEach), why not collect them into a List before doing the processing? Then you can easily get the size of the List.

List<String> names = ...;
List<Something> result = names.stream()
                              .filter(...)
                              .map(...) 
                              .collect(Collectors.toList());
result.forEach (...);
// Then count them:
int count = result.size ();
Eran
  • 387,369
  • 54
  • 702
  • 768
  • 3
    As I mentioned in the question, this list `result` may be too huge to store in Java heap. – auntyellow Jun 24 '20 at 07:23
  • 3
    @auntyellow You do realize that all the instances created by `map` will still be on the heap (regardless of whether or not you collect them to a `List`), right? (they'll just be eligible for garbage collection a bit earlier) So the `List` will just require the additional references to each of these instances (which may be negligible compared to the size of the instances created by `map()` - depending on what these instances are). – Eran Jun 24 '20 at 07:30
  • I don't realize that. This code runs OOM with -Xmx256m: ` String prefix = "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"; List list = IntStream.range(0, 1_000_000).mapToObj(i -> "" + i).collect(Collectors.toList()); List list2 = list.stream().map(s -> prefix + s).collect(Collectors.toList()); System.out.println(list2.stream().mapToInt(String::length).sum()); ` Just because huge `list2`. – auntyellow Jun 24 '20 at 08:06
  • 1
    @Eran sure they will all be in memory at some time. But not at the same time. Or, to be accurate, they may actually be all in memory at the same time, but they don't have to. – Felix Jun 24 '20 at 14:07