Sets python worst case complexity

Question

The following operation on sets in python : x in s ; where s is a set, has

Average case: O(1), and Worst Case: O(n)

Can someone explain these complexities ?

@MattHolbrook-Bull http://meta.stackexchange.com/questions/10811/how-do-i-ask-and-answer-homework-questions — Marvin Emil Brach, Mar 12 '15 at 08:26
@VatsalMahajan http://stackoverflow.com/help/how-to-ask First have a look at the FAQ to get an overall image about this network is working. Than keep the voice low and clean, we all have to respect each other. And in some case Matt is right, but not about the homework for itself, but for the differences between your question and one that is suitable to post here. Such questions will not be answered, but in time deleted. — Marvin Emil Brach, Mar 12 '15 at 08:27
Related: [Why is the order in Python dictionaries and sets arbitrary?](https://stackoverflow.com/q/15479928) Now imagine inserting values with hash values that all map to the same slot. Lots of them. Then testing if one such value is in the set. — Martijn Pieters, Mar 12 '15 at 08:32
Sets use the same underlying data structure (a hash table) as dictionaries, so I duped you to a question asking the exact same thing about dictionaries. — Martijn Pieters, Mar 12 '15 at 08:34

amit · Answer 1 · 2015-03-12T08:25:18.767

It is implemented as a Hash Table.

A hash table (with properly maintained load factor) has an average case of O(1), since the expected number of operations needed to check if an element is in the list is constant, you just need to hash the element, access the table at the desired place, check the bin, which contains all elements with the same hash value - but the expected value of such elements is a constant that depends on the load factor¹

However, in the worst case, all the elements have the same hash values. This results that when needed to check if an element is in the set, it requires to traverse the entire collection (which is in one bin), which is basically a linear scan, and is O(n)

Note: The above explains Separate Chaining hash tables, but the idea is similar for Open Addressing

(1) If the load factor is 1/2 for example, it means the probability that no elements are in the desired address is 1/2. The probability that (exactly) 1 element is in the bin is 1/4, the probability that exactly 2 elements in that bin, is 1/8,....
By summing the above, you get that the expected number of elements in the bin is 1/2 + 1/4 + 1/8 + ... <= 2

Sets python worst case complexity

1 Answers1

Linked

Related