9

First of all, my purpose is to randomly get only one element in both known sets. So my original method is firstly intersect two sets. And then randomly pick up a element from the intersected set. But this is foolish, because that I only need a elements but a intersected set.

So I need to find the algorithm of set.intersection().

I compare the cost time between the methods of 'set.intersection()' and 'for{for{}}'. Set.intersection() is more faster than other one(100 times). So using 'for{for{}}' to pick up a randomly elements is not a wise idea.

What's the algorithm behind set.intersection() in python?

Ding Ding
  • 295
  • 3
  • 7
  • 4
    The CPython one, the Jython, the IronPython one or the pypy one? :p... As long as a correct result is returned when `set.intersection` is called, then any implementation is free to do it how it feels. You're free to download/look at the source code for any of the implementations to see how they do it... – Jon Clements Nov 20 '13 at 15:31
  • 1
    what's your real use model? the real question is 'what's the fastest way to get a random element from an intersection of two sets?' and that probably depends on whether your data is originally a set or not. – Corley Brigman Nov 20 '13 at 16:03

1 Answers1

15

The algorithm is as follows: the smaller set is looped over and every element is copied depending whether it's found in the bigger set. So, it's the C equivalent of

def intersect(a, b):
    if len(a) > len(b):
        a, b = b, a

    c = set()
    for x in a:
        if x in b:
            c.add(x)
    return c

(Or: return set(x for x in a if x in b).)

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 1
    It's somewhat different where `set.intersection` is provided with non-set iterables though (and where there's more than one iterable) – Jon Clements Nov 20 '13 at 15:54
  • @JonClements: in that case it only skips the swapping. The first argument is required to be a `set`. – Fred Foo Nov 20 '13 at 15:56
  • Interesting. Is there any way to ensure that x is coming from a particular set, or is it always the larger one? – mjacksonw Nov 21 '13 at 22:21
  • @M.JacksonWilkinson: it's always the smaller one, because when you iterate over the larger set, you're guaranteed to be processing some elements that are not in the smaller set. – Fred Foo Nov 21 '13 at 23:06