I want to know the complexity of the set.intersection
of python. I looked in the documentations and the online wikis for python, but I did not find the time complexity of this method for multiple sets.

- 1,343
- 7
- 20
- 49
-
1https://wiki.python.org/moin/TimeComplexity – vaultah Jun 15 '15 at 12:42
-
That provides the intersection complexity for two sets only! – M.M Jun 15 '15 at 12:44
-
4In CPython [`set_intersection_multi()`](https://github.com/python/cpython/blob/master/Objects/setobject.c#L1348) is implemented using pairwise intersection, i.e. by intersecting the first two sets and then intersecting the result with the next set and so on. Worst time complexity is thus `number of sets x pairwise complexity` – dhke Jun 15 '15 at 12:50
1 Answers
The python wiki on time complexity lists a single intersection as O(min(len(s), len(t)) where s and t are the sizes of the sets and t is a set. (In English: the time is bounded by and linear in the size of the smaller set.)
Note: based on the comments below, this wiki entry had been be wrong if the argument passed is not a set. I've corrected the wiki entry.
If you have n sets (sets, not iterables), you'll do n-1 intersections and the time can be (n-1)O(len(s)) where s is the set with the smallest size.
Note that as you do an intersection the result may get smaller, so although O is the worst case, in practice, the time will be better than this.
However, looking at the specific code this idea of taking the min() only applies to a single pair of sets and doesn't extend to multiple sets. So in this case, we have to be pessimistic and take s as the set with the largest size.

- 18,769
- 10
- 104
- 133

- 7,226
- 3
- 33
- 74
-
2You mentioned the average case. The worst case is O(len(s) * len(t)). In what situations does the worst case arise? Does the change your answer at all? – Steven Rumbalski Jun 15 '15 at 12:59
-
@Steven Rumbalski All of the sets are exactly the same. And no it doesn't change the answer. – rocky Jun 15 '15 at 12:59
-
I suppose if one was really concerned about forcing memory usage to a min eg: You have a one element set at the end and 999 10million sets prior to that, you could do: `min(list_of_sets, key=len).intersection(*list_of_sets)` which will also minimise the number of comparisons - and possible fewer creations of new `set`s... but I doubt in most cases, it wouldn't really be worth it... – Jon Clements Jun 15 '15 at 13:15
-
@JonClements you might imagine that [set_intersection_multi()](https://github.com/python/cpython/blob/master/Objects/setobject.c#L1348) would do this for you, but it doesn't. I suppose one could change the code to make it do that though. Then you won't have to go through the machination you suggest. – rocky Jun 15 '15 at 13:28
-
@rocky the issue is that `intersection` takes any iterables - so they may not be sets or even sequences... and by the time it realises that, it's possibly too late to backtrack and try the original approach... So it'd be wrong of it to do so... – Jon Clements Jun 15 '15 at 13:31
-
@JonClements Hmmm... that's also true in the case of a single pair intersection. Looking at the code cited though, I think what it does is check first to see if the 2nd object is a set, and if so then do the min. So two things. The wiki entry is slightly incorrect and I'll look at fixing. Second, *set_intersection_multi()* can do the same thing and take the min of all *set* arguements. – rocky Jun 15 '15 at 13:42