2

I read the statement:

HashSet offers constant time performance for the basic operations (add, remove, contains and size).

Is 'contains' true here? While shortlisting the bucket is a contant-time performance - isnt finding the elements within the bucket a o(n) operation?

Do I misunderstand something?

IUnknown
  • 9,301
  • 15
  • 50
  • 76
  • You need to take a look at the main hashing strategies to cope with collisions: list-chaining and open addressing. Understand how they work and you'll understand why `contains` works in average constant time. See, for example: http://en.wikipedia.org/wiki/Hash_table#Separate_chaining . If you want something formal, you can always read Knuth's "The Art of Computer Programming, vol. 3 'Sorting and Searching'". – naitoon Mar 29 '13 at 05:45
  • To answer all these and similar questions I have god tutorial on [Internal life of HashSet in java](http://volodial.blogspot.com/2013/07/internal-life-of-hashset-in-java.html) – Volodymyr Levytskyi Jul 27 '13 at 08:43

6 Answers6

2

As the documentation says,

This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets

Check the assumption part above. Contains is o(n) if all the elements end up in one bucket which will be result of world's one of the poorest hash functions. HashSet internally uses HashMap.

Adisesha
  • 5,200
  • 1
  • 32
  • 43
2

n in o(n) stands for the number of elements in the hash, not in the bucket. and since the number of elements inside the bucket doesn't grow linearly with the size of the set and is limited, there is a constant maximum time it may take. and constant times don't affect the notation. at least if you have a perfect hashing function, which is another problem altogether.

fdreger
  • 12,264
  • 1
  • 36
  • 42
0

HashSet uses a hasing function to locate elements. Evaluating in it is O(1).

For example, if we have a hash table for storing names of employees, we would use the function as the sum of ASCII chars in it:

f(name) = sum(ascii chars)

and f('ahmed') = 'a' + 'h' + 'm' + 'e' + 'd' = 97 + 104 + 109 + 101 + 100 = 511. so, 'ahmed' will be stored at location 511. However, the previous hash function is very bad and will cause a lot of conflicts when avaluated with names that produce the same sum. Hence, finding a good hash function is very critical goal in hash table implementation.

Refer to the Hash Table data structure for more information.

0

Yes it is finding the elements within the bucket a o(n) operation.
contains Operation performance is O(n). It equaling the target object with all other objects inside HashSet.

Subhrajyoti Majumder
  • 40,646
  • 13
  • 77
  • 103
0

As the name implies HashSet is implemented through hasing technique. So this means the complexity of finding an object is O(1). This is why contains takes O(1) time. But in worst case, if all objects are put in same bucket then complexity will be O(n).

Returns true if this set contains the specified element. More formally, returns true if and only if this set contains an element e such that (o==null ? e==null : o.equals(e)).

public boolean contains(Object o) {
    return map.containsKey(o);
}

Above is the contains method from HashSet class. Note that its using map to store objects.

rai.skumar
  • 10,309
  • 6
  • 39
  • 55
0

isnt finding the elements within the bucket a o(n) operation?

That really depends. If there are so many hash collisions that an algorithm must go through each value and retrieve the one you are interested in, the time is o(n) in the worst case. But then it would not be a good hash function at all. A good hash function spreads the assigned hash equally throughout its range.

This could also happen if you return the same hashCode all the time, which is not a good sign either. That would give you the same hash irrespective of the cardinality of the set.

Deepak Bala
  • 11,095
  • 2
  • 38
  • 49