0

Let's say I have two sets:

s1 = {1, 2, 3, 4, 5}
s2 = {4, 5, 6, 7, 8}

I want to split s1 into {4, 5} and {1, 2, 3}, where the first part is an intersection of s1 and s2, and the second part is the remainder.

I can do it like this:

part1 = s1.intersection(s2)
part2 = s1.difference(s2)

But it seems to me that this way I'll perform quite the same operation twice, which can take a long while on big sets. Can I do it with one operation in Python? I want to do something like

part1, part2 = slit_sets(s1, s2)
Yolo Tolo
  • 69
  • 7

2 Answers2

1

Why not use simple loop:

def slit_sets(s1, s2):
    inter = set()
    diff = set()
    for s in s1:
        if s in s2:
            inter.add(s)
        else:
            diff.add(s)
    return inter, diff
Lei Yang
  • 3,970
  • 6
  • 38
  • 59
  • I used `timeit` to measure that `[el for el in s2 if el in s1]` takes at least twice longer than `s1.intersection(s2)`, which is why I want to avoid loops – Yolo Tolo Jun 28 '21 at 09:29
  • that's interesting. i'm not familiar with internal implementaions. but searched and found the implementiation is similiar loop: [What's the algorithm of 'set.intersection()' in python?](https://stackoverflow.com/questions/20100003/whats-the-algorithm-of-set-intersection-in-python) – Lei Yang Jun 28 '21 at 09:36
  • perhaps it's because `intersection` is performed in `C` while the loop is just Python? – Yolo Tolo Jun 28 '21 at 09:42
  • i think so. the big O complexity is the same. so I think they made not that big difference. – Lei Yang Jun 28 '21 at 09:43
1

There is no single builtin operation which will return both the intersection and the difference, so you will always need to call two methods.

Intuitively I expected

i = s1.intersection(s2)
d = s1.difference(i)
return i,d

to be faster than

i = s1.intersection(s2)
d = s1.difference(s2)
return i,d

since it calculates the difference against a smaller set, but this is untrue - timeit results are roughly equivalent even for large sets with some thousands elements in them. A slight improvement, around 5%, is achieved instead with

d = s1.difference(s2)
i = s1.difference(d)
return i,d
gimix
  • 3,431
  • 2
  • 5
  • 21