1

There are multiple ways to find a element in a list. For example I have a list with some elements that have a unique ID. For a given ID I want to return the specific element from the list.

List example (just schema, not correct syntax):

//Element(String id, int value)
List<Element> list = new ArrayList();
int elemCount = 1000000;
for(int i = 0; i<elemCount; i++)
    list.add(new Element("id"+i, i));
Collections.shuffle(list);
Element e = getElement("id" + ThreadLocalRandom.current().nextInt(0, elemCount));

Given the following two methods, which does perform generally better based on the java internal implementation and why?

First method:

public Element getElement(String id) {
   return list.stream()
      .filter(e -> e.getId().equals(id))
      .findAny()
      .orElse(null);
}

Second method:

public Element getElement(String id) {
   for(Element e : list)
      if(e.getId().equals(id))
         return e;
   return null;
}
  • "Element" as object is chosen just for example.
  • Not relevant: Structure and size of the Element, PC performance etc.
  • Java Version: 1.8.0_111
Spenhouet
  • 6,556
  • 12
  • 51
  • 76
  • The one that completes first on your machine, with your data, performs better. So try to find that out. – Marko Topolnik Oct 27 '16 at 09:27
  • @MarkoTopolnik As you can see in the examples, it's not about specific data. Any data should perform the same. Therefor i do hope to find a general and well explained answer two my question. – Spenhouet Oct 27 '16 at 09:30
  • What if it depends on data size, your JVM version, your number of cores, etc? They have the same big-O complexity, the rest is up to details which you can't pretend don't exist. – Marko Topolnik Oct 27 '16 at 09:31
  • All that, could be included in a answer to that question. The question is how both methods perform compared to each other. The implementation detail is in the question above. – Spenhouet Oct 27 '16 at 09:34
  • I would expect performance to be very similar. If there is a small difference, I dare not guess at which might be faster or more efficient. So have to agree with @MarkoTopolnik. Which is (slightly) faster could also potentially depend on the data, so run your test many times with realistic data if you want to be sure. – Ole V.V. Oct 27 '16 at 09:50
  • And as always, don’t do premature optimization (search for the term if you didn’t know it already). – Ole V.V. Oct 27 '16 at 09:51
  • @Holger So you say that stream.filter.findAny is internally the same as for-if? – Spenhouet Oct 27 '16 at 16:56
  • 1
    No, unless you consider blue and yellow to be the same color. I’m just saying there is no ranking without a specific criteria. Likewise, you can’t ask for a performance ranking of two semantically equivalent operations without any knowledge about, e.g. the list type or JVM/JRE implementation. The answer can even change for subsequent executions within the same runtime. – Holger Oct 27 '16 at 17:22
  • I didn't asked for a specific case. I want to know how both implementations compare two each other in performance. – Spenhouet Oct 27 '16 at 19:53
  • @Spen How do you run the code without running a specific case? – Ole V.V. Oct 28 '16 at 06:02
  • I do, but it doesn't matter for my question. I just want to know how both methods compare in the described case based on there internal implementation. I don't want benchmarks. I search for someone with knowledge of java internals and a good understanding of how both methods solve the decribed problem. Until know there where just deconstructive comments. – Spenhouet Oct 28 '16 at 09:18
  • 3
    The difference shouldn't be significant. If absolute performance is absolutely needed, then you should measure. Otherwise, you should choose what you find the most readable: both solutions are O(n). If you need to do that many times on the same collection, especially if the collection is large, then you should consider transforming the list into a HashMap, where lookups would be O(1). – JB Nizet Oct 28 '16 at 10:46
  • If your question boils down to "how are streams implemented" that's different from "will streams be faster than loops". Every JVM by default optimizes your code on the fly, taking into account runtime variables. So the code you present might not be what is actually run - the actual one can't be known to us. This is why everybody says "measure yourself". – mabi Oct 28 '16 at 11:33
  • Since you all recommended to do a benchmark, i did write one and shared my result and implementation below. – Spenhouet Oct 29 '16 at 00:29

1 Answers1

0

Since i couldn't get a answer based on the java internal implementation, i did a benchmark for my self.

I want to share with you the results i came up with. I did run a benchmark for every list from size 1 to 1000. Every benchmark for every list size did run 10000 times with the average calculated to get out errors.

Disclaimer: I'm not a benchmark expert and i just did a simple implementation. The results are not 100% accurate and the conclusions i made can be wrong. If you can do it better, than feel free improve my benchmark oder make one your self and post the results in a new answer.

You can find the benchmark files here: https://github.com/Spenhouet/BenchmarkFindElement

  1. Using a parallelStream for this simple task has more overhead and is slower than just a normal stream.
  2. If the list is sorted and the wanted item is the last of the list, .findFirst() ist faster than .findAny().
  3. If the list is sorted and the wanted item is the last of the list, by using .findAny() the "for" method is faster.
  4. If the list is shuffled and the wanted item id is random than .findFirst() and .findAny() makes no difference.
  5. If the list is shuffled and the wanted item id is random than there is only a little difference between the "stream" and the "for" method. The "stream" method is a little bit better

Tested with Java Version: 1.8.0_111

For example the performance graph for finding a random id in the the shuffled lists with 505 to 575 items. The blue line is the "for" method and the red line the "stream" method.

Benchmark-Graph for subset of items

For example the performance graph for finding a random id in the the shuffled lists with 1 to 100000 items (only 10 times average). The blue line is the "for" method and the red line the "stream" method.

Benchmark-Graph for 1 to 100000 items

If you want to take a look at some graphs / generate them your selfe, then just download the repository from the github link above, import the project and run it with the VM options -Xms1024m -Xmx4068m.

With 1 to 40000 list elements and 100 runs each for average and the VM options: -Xms1024m -Xmx4068m -XX:+PrintCompilation -verbose:gc -Xbatch -XX:CICompilerCount=2

Benchmark-Graph for 1 to 40000 items and other VM options

Spenhouet
  • 6,556
  • 12
  • 51
  • 76
  • 1
    The Benchmark is flawed in several ways. The outcome *might* still be confirmed when doing a [proper microbenchmark](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java), I just want to point out that you cannot draw profound conclusions from the current results. – Marco13 Oct 29 '16 at 01:16
  • For sequential stream `findFirst` and `findAny` internal implementation is exactly the same, thus your conclusion #2 is definitely wrong. For parallel stream the result can highly depend on number of cores and target element position (in some combinations less cores would yield faster result). – Tagir Valeev Oct 29 '16 at 10:05
  • @Marco13 Thank you for pointing this out. I'm not a expert in benchmarking and don't have the time to do it more advanced. For me the results i got are sufficient but good to know for others that they are not completely accurate. – Spenhouet Oct 29 '16 at 12:56
  • @TagirValeev: findFirst and findAny did constantly give the same results in which findFirst was faster for a sorted list and for finding the last element. I did run both methods 1000 times and took the average. There has to be something to it. Regarding the parallel stream, sure, you are right. It depends on your system. But as described, the element positions are random as is the element to find and every method did run 1000 times... Your described error should be rooted out. – Spenhouet Oct 29 '16 at 13:02
  • "There has to be something to it." -- wrong benchmarking methodology, I guess. – Tagir Valeev Oct 29 '16 at 13:10
  • @TagirValeev Then i would like to motivate you to look into my benchmark (github link above) or write your own and share the results. That would be really constructive. – Spenhouet Oct 29 '16 at 13:13
  • @Spen, your question already has three close-votes (two remaining to close). Writing a good answer would take at least one hour (including time necessary to perform the experiments). There's a chance that when I will do it the question will be closed already. – Tagir Valeev Oct 29 '16 at 13:16
  • @TagirValeev Year, that is sad. Until now there was only unconstructive comments and a lot of hate for a (in my eyes) interesting question. – Spenhouet Oct 29 '16 at 13:21
  • The "hate" that you percieve is not "hate", but ... some sort of "annoyance", as the result of dozens of similar questions about the performance of a few specific lines of code. In many cases, performance differences either don't matter (because they are not the overall bottleneck), or they are negligible (in the range of a few percent), or they depend on many surrounding factors (VM, GC, PC, OS, application pattern...), or are rendered meaningless by the fact that an O(n) implementation is compared to something that can be done in O(1) - or all of the above. (I did not downvote, though) – Marco13 Oct 29 '16 at 14:28
  • Sure, in the application this performance doesn't matter. I'm a master student in computer science and i was just asking my self which implementation (stream vs for-loop) would theoretically perform better for the given task based on the internal implementation. I don't have the deep java knowledge to answer this question and i did hope that there is someone else that has but it seems there is no one in this community with such a knowledge. All the comments i did get were just unconstructive and unobjective. They didn't contribute to the question in anyway. That is what i call "hate". – Spenhouet Oct 29 '16 at 14:43
  • @TagirValeev wrote: For sequential stream findFirst and findAny internal implementation is exactly the same... Two methods could look equivalent in code, but JVM is a large blackgox, so it could treat both methods differently based on some hidden parameters that you could not check. And Spen intended to benchmark entire JVM as is, not its particular modules like GC or C1/C2 compiler. And if the JVM behaves differently then its complicated algorithms sees some difference between two seemingly identical implementations – ayvango Oct 29 '16 at 18:16
  • @ayvango, not in this case. – Tagir Valeev Oct 30 '16 at 01:57