-3

We are using a software to take care of code quality and from which I saw today a finding saying that calling contains() on an arraylist is inefficient. And the suggested way (supposably better) is to use HashSets.

So regarding to that software this:

boolean doesContain = (new HashSet<>(arrayList)).contains("something");

is more efficient that this:

boolean doesContain = arrayList.contains("something");

Can this actually be true, and if yes why?

Sir. Hedgehog
  • 1,260
  • 3
  • 17
  • 40
  • ehm no - copying the entire list before checking is less efficient (unless you do a lot of contains-checks afterwards). – Hulk Nov 27 '18 at 10:17
  • When you actually look up the value in the set, it's fast; but the extra work of creating a new set from your existing list will take longer than the work of finding one item directly in the list. – khelwood Nov 27 '18 at 10:18
  • I think what it meant to say is to use `HashSets` instead of `ArrayLists` if the code is doing constant data lookup. `HashSets` have O(1) lookup efficiency while `ArrayLists` is O(n). But creating `HashSets` is time-consuming. – LonelyCpp Nov 27 '18 at 10:19
  • @all regarding to that software the first is accepted, instead of the second one. it is not mentioning that the arraylists should be replaced with hashsets – Sir. Hedgehog Nov 27 '18 at 10:19
  • so after i change the finding from as it was on the second example to the first, the finding is deleted. – Sir. Hedgehog Nov 27 '18 at 10:20
  • Static analyzers have limits. Always use the hints they give you with a grain of salt. Lookup in Lists is typically O(N), lookup in Sets is typically O(1), but creating a Set from a List is O(N). – Hulk Nov 27 '18 at 10:20
  • @Hulk by static analyzer what do you refer to? – Sir. Hedgehog Nov 27 '18 at 10:21
  • @Sir.Hedgehog the tool that you are using is likely something that performs static sourcecode analysis - like SonarQube, FindBugs etc. – Hulk Nov 27 '18 at 10:23
  • *"Can this actually be true, and if yes why?"* - 1) Yes, 2) Because [hash tables](https://en.wikipedia.org/wiki/Hash_table). – Stephen C Nov 27 '18 at 10:27
  • @StephenC so you disagree with oleg.cheredniks answer below? cause he is making a valid point there – Sir. Hedgehog Nov 27 '18 at 10:30
  • If you really want the answer, you will need to make your own time measurements on data that are realistic for your particular program (not that I easily can imagine a situation where I’d want to bother). – Ole V.V. Nov 27 '18 at 10:50
  • 1
    Related: https://stackoverflow.com/questions/10799417/performance-and-memory-allocation-comparison-between-list-and-set – Hulk Nov 27 '18 at 10:51

2 Answers2

2

new HashSet<>(arrayList) takes O(n) time and O(n) space to build a HashSet.

hashSet.contains("something") takes O(1) time to find an element.

arrayList.contains("something") takes O(n) time to find an elements.

As result:

(new HashSet<>(arrayList)).contains("something") takes O(n) + O(1) = O(n) time + O(n) space complexity

arrayList.contains("something") takes O(n) time + 0 space complexity

It means that according to Big O notation, both expressions have O(n) time complexity, but arrayList.contains("something") does not take additional space, instead of newly created HashSet.

P.S.

I am not making any prediction on other cases, because I do not know other related aspects of your application. I am analyzing given piece of code only.

Oleg Cherednik
  • 17,377
  • 4
  • 21
  • 35
  • so in the case we have to use an arrayList, you conclude to that it is not time/performance efficient to add the arrayList to a hashSet and then use contains? Did i get that right? – Sir. Hedgehog Nov 27 '18 at 10:28
  • No. It is better to continue using `ArrayList`. – Oleg Cherednik Nov 27 '18 at 10:30
  • 2
    @Sir.Hedgehog If you are performing a single lookup, keep the `ArrayList`. If you perform LOTS of lookups AND don't want/need duplicate entries, use `Set` (but don't create a new one for each lookup). Otherwise, profile to see which is faster. – Hulk Nov 27 '18 at 10:36
0

Potentially. To check whether an element is in a list we must check each element sequentially.

Is element 0 the element we're looking for? Yes/no
Is element 1 the element we're looking for? Yes/no
...
etc.

You can stop iterating once you've found it, obviously.

Complexity:

  • Best case: it's the first element. This is a constant time lookup. O(1)
  • Average case: it's somewhere in the middle. O(n/2)
  • Worst case: it's the last element. We must do the comparison for every element. O(n)

A HashSet is always a constant time lookup. You just compute the hash code and call Object::equals.

As to whether you should be using a set, it really depends on what else you're using it for. If you're only doing contains, a set is the most appropriate. Otherwise, please see Which Java Collection should I use?

Michael
  • 41,989
  • 11
  • 82
  • 128
  • in general okusing hashsets is more efficient, but is it actually worth it (considering performance) to use the "suggested" way? – Sir. Hedgehog Nov 27 '18 at 10:26
  • @Sir.Hedgehog Too many unknowns. If you're trying to squeeze every last drop of performance, it's a good candidate for performance improvements. In practice, `contains` on a List containing a million elements will still probably return in a fraction of a second (given a reasonable `equals` implementation). – Michael Nov 27 '18 at 10:30
  • It depends on your use-case. Mr Hedgehog - you need to do some reading / study so that you understand these issues for yourself, and are in a position to make your own (informed) decisions. Because the decisions depend on the details of your application. – Stephen C Nov 27 '18 at 10:30
  • so it would be worth it, (regarding performance) if only used for a list with loads of elements to parse through? – Sir. Hedgehog Nov 27 '18 at 10:31
  • Please read what we just said. (I don't understand your description ... and our advice depends *critically* on a correct and detailed understanding of the use-case.) – Stephen C Nov 27 '18 at 10:33
  • In general? There is no general answer! How many different ways do I have to say that?? (And I can't share my knowledge with someone who insists on not accepting what I say.) – Stephen C Nov 27 '18 at 10:34
  • 1
    @Sir.Hedgehog In terms of practical benefits, yes, but you should always think critically about the data structure you're using and whether it's the most appropriate one for your goal. Don't just always default to a list because it's "easier". – Michael Nov 27 '18 at 10:35
  • And once again, the answer is that it depends! I don't see how I can be clearer about that. It depends on whether the time / space overheads of creating the HashSet oughtweigh the cost of searching the list. And **that** depends on your application! – Stephen C Nov 27 '18 at 10:36
  • 2
    @Sir.Hedgehog Adding the contents of an ArrayList into a HashSet just for the purposes of calling `contains` once will not grant you any performance improvement. Copying the collection requires copying each element individually, which (in terms of complexity) is just as bad as the worst-case of `ArrayList::contains`. To see performance improvements, you must maintain the collection as a HashSet and ditch the ArrayList altogether. – Michael Nov 27 '18 at 10:37
  • 1
    (In fact, creating the hashset is significantly more expensive than a linear search of a list with the same elements. It is not just copying. The elements have to be hashed ... and the HashSet data structures are rather complicated / bulky / expensive to build. For a number of reasons. But both are O(N) operations. On average ....) – Stephen C Nov 27 '18 at 10:40
  • @StephenC thats helpful, thank u, the rest wasnt needed. – Sir. Hedgehog Nov 27 '18 at 10:45
  • Well ... actually ... I think it was. Because you **need** to understand and accept that there are no magic answers here. – Stephen C Nov 27 '18 at 10:51
  • 1) I didn't downvote. 2) You are missing the point. 3) If you think you got your answer ... good luck to you. – Stephen C Nov 27 '18 at 11:48