73

What's the data structure in Java that has the fastest operation for contains() ?

e.g. i have a set of numbers { 1, 7, 12, 14, 20... }

Given another arbitrary number x, what's the fastest way (on average) to generate the boolean value of whether x is contained in the set or not? The probability for !contains() is about 5x higher.

Do all the map structures provide o(1) operation? Is HashSet the fastest way to go?

pnuts
  • 58,317
  • 11
  • 87
  • 139
codechobo
  • 829
  • 1
  • 7
  • 12

4 Answers4

136

look at set (Hashset, enumset) and hash (HashMap,linkedhash...,idnetityhash..) based implementations. they have O(1) for contains()

This cheatsheet is of great help.

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • 8
    For what it's worth, hash maps in general aren't O(1) in lookup when hash collisions occur (and they can happen pretty often, if very few at a time). Worst case is O(n) in lookup. – Blindy Jul 17 '10 at 07:27
  • I agree with Blindy. Performance of hash based collection is limited by performance of hash function. – sbidwai Jul 17 '10 at 07:42
  • When I went recently, the site was down. If this happens to you, you can use this [link](http://web.archive.org/web/20120105103844/http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf) – EasilyBaffled Apr 28 '13 at 18:17
  • Page cannot be crawled or displayed due to robots.txt (correct the link plz) – iluvatar_GR Jul 14 '13 at 14:33
9

For your particular case of numbers with a relatively high density I'd use a BitSet, it will be faster and much smaller than a hash set.

starblue
  • 55,348
  • 14
  • 97
  • 151
5

The only data structure faster than HashSet is likely to be TIntHashSet from Trove4J. This uses primitives avoiding the need to use Integer Objects.

If the number of integers is small, you can create a boolean[] where each value present is turned into a "true". This will be O(1). Note: HashSet is not guarenteed to be O(1) and is more likely to be O(log(log(N))).

You will only get O(1) for an ideal hash distribution. However, it is more likely you will get collisions of hashed buckets and some lookups will need to check more than one value.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
-2

hashing(hash set) is the best one with O(1)

Satish
  • 203
  • 5
  • 9