3

I don't know exactly how Streams work internally, but I have always wondered why Stream#findAny() exists, when there is Stream#findFirst(). Find first suggests that streams keep the order of the array/Collection/Iterator from which they are created. So why use any? If it is irrelevant which element you are getting, it may as well be the first one. From what I'm thinking, findFirst should always execute in constant time.

The Javadoc of Stream#findAny() states:

The behavior of this operation is explicitly nondeterministic; it is free to select any element in the stream. This is to allow for maximal performance in parallel operations; the cost is that multiple invocations on the same source may not return the same result. (If a stable result is desired, use {@link #findFirst()} instead.)

But even then isn't the order of elements known? The provided datastructure stays the same.

Lino
  • 19,604
  • 6
  • 47
  • 65
xtay2
  • 453
  • 3
  • 14
  • 9
    A stream wasn't necessarily created from an ordered input or it might be faster to execute some intermediate step in a non-deterministic way. If the pipeline knows you don't care about *specifically* finding the first result, it might be able to optimize the processing. – Joachim Sauer Aug 04 '22 at 11:56
  • 10
    Take a stream of two elements and process them to get a result. The first takes 400 years to complete. The second 1 minute. `findFirst` will wait the whole *400 years* to return the first elements result. While `findAny` will return the second result after 1 minute. See the difference? Depending on the circumstances of what exactly you're doing, you just want the fastest result, and don't care about the order. – Lino Aug 04 '22 at 11:59
  • So this can only happen in parallell streams? In a sequential stream there should be no way of knowing how long an operation will take, right? – xtay2 Aug 04 '22 at 12:02
  • 7
    @xtay2 the javadoc you've shown also mentions that part: *This is to allow for maximal performance in **parallel** operations;*. In a sequential stream I'm assuming it behaves the same as `findFirst()` – Lino Aug 04 '22 at 12:04
  • It is an implementing thing: **parallel** is evidently. But also before **sorted** is completed `findAny` could be successful (consider the filtering is applied _before_ sorting); unlikely sorted+findAny is. _I did not inspect the sources._ – Joop Eggen Aug 04 '22 at 12:07

1 Answers1

8

Stream#findAny is used it when we're looking for an element without paying an attention to the encounter order.

Comment from Lino at the time of answer which correctly summarizes the difference:

Take a stream of two elements and process them to get a result. The first takes 400 years to complete. The second 1 minute. findFirst will wait the whole 400 years to return the first elements result. While findAny will return the second result after 1 minute. See the difference? Depending on the circumstances of what exactly you're doing, you just want the fastest result, and don't care about the order.

Consider this code for findFirst:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
Optional<Integer> result = list
                           .stream()
                           .filter(num -> num < 4)
                           .findFirst();              // .findAny();

Here, findAny operates as findFirst, still not guarantee. On my machine, after 10 odd runs, The output is always 1 though.

Now, let's consider a parallel stream, for which findAny is designed for:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5);
Optional<Integer> result = list
                           .stream()
                           .parallel()
                           .filter(num -> num < 4)
                           .findAny();

Now the output on my machine 3. But it could be anything 1, 2, or 3.

Let's do some benchmarking with inputs favoring findAny (i.e. the predicate should be true after the middle of the stream).

List<Integer> list = IntStream.rangeClosed(1, 1000000).boxed().collect(Collectors.toList());

long findFirstStartTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
    list.stream().filter(num -> num > 500000).findFirst();
}
long findFirstEndTime = System.currentTimeMillis();

long findAnyStartTime = System.currentTimeMillis();
for (int i = 0; i < 10000; i++) {
    list.stream().parallel().filter(num -> num > 500000).findAny();
}
long findAnyEndTime = System.currentTimeMillis();

System.out.println("findFirst time taken: " + (findFirstEndTime - findFirstStartTime));
System.out.println("findAny time taken: "   + (findAnyEndTime - findAnyStartTime));

findFirst will go sequentially, till the middle of the stream and findAny will go as the almighty wants. The results are staggering:

findFirst time taken: 29324
findAny time taken: 623

Using JMH Benchmarking with params:

@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)

Benchmark          Mode  Cnt        Score        Error  Units
FindAny.findAny    avgt   50   191700.823 ± 251565.289  ns/op
FindAny.findFirst  avgt   50  4157585.786 ± 355005.501  ns/op
Harshal Parekh
  • 5,918
  • 4
  • 21
  • 43
  • 1
    While the answer displays everthing relevant (and correct!). I would be careful when using a simple benchmark like yours. You might be running into some problems that favor one result over the other. Like said in this other question: [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Lino Aug 04 '22 at 23:02