While side effects in behavioral parameters are discouraged, they are not forbidden, as long as there’s no interference, so the simplest, though not cleanest solution is to count right in the filter:
AtomicInteger rejected=new AtomicInteger();
Files.lines(path)
.filter(line -> {
boolean accepted=isTypeA(line);
if(!accepted) rejected.incrementAndGet();
return accepted;
})
// chain processing of matched lines
As long as you are processing all items, the result will be consistent. Only if you are using a short-circuiting terminal operation (in a parallel stream), the result will become unpredictable.
Updating an atomic variable may not be the most efficient solution, but in the context of processing lines from a file, the overhead will likely be negligible.
If you want a clean, parallel friendly solution, one general approach is to implement a Collector
which can combine the processing of two collect operations based on a condition. This requires that you are able to express the downstream operation as a collector, but most stream operations can be expressed as collector (and the trend is going towards the possibility to express all operation that way, i.e. Java 9 will add the currently missing filtering
and flatMapping
.
You’ll need a pair type to hold two results, so assuming a sketch like
class Pair<A,B> {
final A a;
final B b;
Pair(A a, B b) {
this.a=a;
this.b=b;
}
}
the combining collector implementation will look like
public static <T, A1, A2, R1, R2> Collector<T, ?, Pair<R1,R2>> conditional(
Predicate<? super T> predicate,
Collector<T, A1, R1> whenTrue, Collector<T, A2, R2> whenFalse) {
Supplier<A1> s1=whenTrue.supplier();
Supplier<A2> s2=whenFalse.supplier();
BiConsumer<A1, T> a1=whenTrue.accumulator();
BiConsumer<A2, T> a2=whenFalse.accumulator();
BinaryOperator<A1> c1=whenTrue.combiner();
BinaryOperator<A2> c2=whenFalse.combiner();
Function<A1,R1> f1=whenTrue.finisher();
Function<A2,R2> f2=whenFalse.finisher();
return Collector.of(
()->new Pair<>(s1.get(), s2.get()),
(p,t)->{
if(predicate.test(t)) a1.accept(p.a, t); else a2.accept(p.b, t);
},
(p1,p2)->new Pair<>(c1.apply(p1.a, p2.a), c2.apply(p1.b, p2.b)),
p -> new Pair<>(f1.apply(p.a), f2.apply(p.b)));
}
and can be used, for example for collecting matching items into a list and counting the non-matching, like this:
Pair<List<String>, Long> p = Files.lines(path)
.collect(conditional(line -> isTypeA(line), Collectors.toList(), Collectors.counting()));
List<String> matching=p.a;
long nonMatching=p.b;
The collector is parallel friendly and allows arbitrarily complex delegate collectors, but note that with the current implementation, the stream returned by Files.lines
might not perform so well with parallel processing, compare to “Reader#lines() parallelizes badly due to nonconfigurable batch size policy in its spliterator”. Improvements are scheduled for the Java 9 release.