What is the Searching difference between various small HashSet and 1 large HashSet?

Question

I did some POC and found that when I search in a large Set of 400 items, it is 6-7 times faster than searching in 20 sets of 20 items each. Although in both cases, hashing is used but how does just looping costs so much ?

Presumably if you're looking in 20 sets, you're doing up to 20 lookups instead of one. Hashed lookups are fast because the hash tells you where to find the object. Looped lookups are slow because you are looking in every place until you find the thing. — khelwood, Dec 14 '16 at 11:31

score 0 · Answer 1 · edited May 23 '17 at 11:53

Would you expect it to take the same time or 20 times longer? With 20 sets, you need 10.5 lookups on the average (assuming the item is present in exactly one of them), so a factor of 10.5 should result. This is reasonable close to your reported factor of 6-7. As you gave us no code, we can't point to where your benchmark fails. But without reading something about how to benchmark, nobody gets it right.

If you want to know more, provide us with more details.

PS: You should hardly ever use 20 sets the way you're probably using then. A Map<Item, Integer> is much better as a representation of a set partitioning and is as fast as a Set<Item> (actually, a Set is implemented via a Map).

What is the Searching difference between various small HashSet and 1 large HashSet?

1 Answers1