The following operation on sets in python : x in s
; where s is a set, has
Average case: O(1), and Worst Case: O(n)
Can someone explain these complexities ?
The following operation on sets in python : x in s
; where s is a set, has
Average case: O(1), and Worst Case: O(n)
Can someone explain these complexities ?
It is implemented as a Hash Table.
A hash table (with properly maintained load factor) has an average case of O(1)
, since the expected number of operations needed to check if an element is in the list is constant, you just need to hash the element, access the table at the desired place, check the bin, which contains all elements with the same hash value - but the expected value of such elements is a constant that depends on the load factor1
However, in the worst case, all the elements have the same hash values. This results that when needed to check if an element is in the set, it requires to traverse the entire collection (which is in one bin), which is basically a linear scan, and is O(n)
Note: The above explains Separate Chaining hash tables, but the idea is similar for Open Addressing
(1) If the load factor is 1/2 for example, it means the probability that no elements are in the desired address is 1/2. The probability that (exactly) 1 element is in the bin is 1/4, the probability that exactly 2 elements in that bin, is 1/8,....
By summing the above, you get that the expected number of elements in the bin is 1/2 + 1/4 + 1/8 + ... <= 2