2

This question is essentially an addendum to this one which already asks about Java's contains() performance, but my use case targets collections with a small number of String elements.

To make the problem more precise, there will be:

  • Only java.lang.String elements in the collection (thus using String.equals(...) in the contains(...) calls).
  • String have average length of 16.
  • The average number of elements in a single collection is 7.
  • The collection will face frequent contains(...) checks, occasional additions, and no removals.
  • Ordering of Strings is not important at all.
  • This code would run on a web server, and there would be a large number of such collection objects (~1000 per user request).

Which collection would be a better fit for this use-case, both memory-wise and time-wise?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Abhishek Divekar
  • 1,131
  • 2
  • 15
  • 31
  • 4
    This is better answered by results of actual testing and benchmarking... – ernest_k Apr 25 '18 at 06:21
  • Agreed, but I wanted to know if anyone has gone through this process before, and what their findings were. – Abhishek Divekar Apr 25 '18 at 06:22
  • 1
    For specific input constraints, you'll need to benchmark to be sure which one's faster. However, the post you linked to should already tells you that `contains` is O(1) vs O(n), which seems to answer the more general question. ArrayList generally uses marginally less memory than HashSet. – Bernhard Barker Apr 25 '18 at 06:24
  • It is evident that using HashSet is faster (from that post). Regarding memory, I guess there shouldn't be a big difference (considering 7000 objects per request). Using a profiler will give you the memory it takes – Thiyagu Apr 25 '18 at 06:24
  • @user7 Is it though? Remember, the use case calls for 1000 HashSets of size 7, not one of size 7000. In such a case, the cost of creating and managing the HashSet instances might outweigh the cost of `contains(...)` calls. – Abhishek Divekar Apr 25 '18 at 06:25
  • 1
    @Dukeling Big-O is only useful for considering large values of `n`, because it ignores constant overhead. Since `n` will only be 7, Big-O is useless for comparison, given that _O(1)_ may actually be slower than _O(n)_ for `n=7`. Actually, since `n` is known to be 7, _O(n)_ is actually _O(7)_, which means _O(1)_, so there is really no difference between _O(1)_ and _O(n)_ – Andreas Apr 25 '18 at 06:27
  • @Andreas exactly why I asked this question, thanks :) I want to point out though, it is 7 _on average_. Won't be exactly 7. – Abhishek Divekar Apr 25 '18 at 06:29
  • @Dukeling I wouldn't call 5.5x "marginally less memory" (https://stackoverflow.com/a/10799668/4900327) – Abhishek Divekar Apr 25 '18 at 06:29
  • The difference may be there. We can't tell without profiling/benchmarking – Thiyagu Apr 25 '18 at 06:31
  • @Dukeling I've mentioned this in both the question and these comments: I plan to create ~1000 collections per user request. A saving of 1 ms shaves off 1 entire second per user request. – Abhishek Divekar Apr 25 '18 at 06:32
  • 1
    If you're doing 1000 contains checks per request, that sounds like a design problem. – Bernhard Barker Apr 25 '18 at 06:35

1 Answers1

3

Which collection would be a better fit for this use-case, both memory-wise?

A String[]. Cannot be more compact than that.

Of course, that's not really a Collection, but I took that in the loosest meaning possible.

For best lookup (contains) performance, make it sorted and use binary search.

... and time-wise?

Probably a HashSet, but you need to test performance of such small collections, because the O(log n) performance of e.g. binary search may actually be faster than the O(1) performance of a hash table lookup, when n is only 7.

For that small a value of n, the performance difference may be negligible, and memory footprint may be more important.

Deciding between memory footprint and runtime performance, only you can decide which is "a better fit". We can't decide that for you.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • `String[]` would not be a possibility, unfortunately. As mentioned, I do not know the number of elements beforehand to instantiate the array. 7 is just an average, in rare cases there might be as many as 20. – Abhishek Divekar Apr 25 '18 at 07:02
  • @abhidivekar You said *"occasional additions"*, so you resize array whenever you need to add another value, then do binary search to find where to add the value. Since it's only *occasional*, performance will not suffer, but then again, it depends on how often *occasional* occurs, and **only testing** will tell you that. – Andreas Apr 25 '18 at 07:08
  • Hmm, yes that's true. And I will definitely test it before putting it into production, I'm just digging deeper into the problem at the moment :) – Abhishek Divekar Apr 25 '18 at 07:12