0

I need to intersect a set a = {1,5,7} with each set in a list b = [{1,2,3}, {2,3,4},...] obtaining a new list c = [{1}, {}, ...].

The standard solution to this problem is a plain comprehension c = [a.intersect(b_i) for b_i in b], as per "intersect a list with a nested list" and the various similar posts. However, this implies a for loop which becomes unwieldy when seeking to do 20k intersections on large sets in real time.

Is there any alternative approach to solve this problem, that might be speedier?

(as an example I was looking at the various "intersect multiple set" posts for inspiration, however that approach solves the problem of a single AND operation on n sets, not on n AND operations on n set pairs)

Pythonic
  • 2,091
  • 3
  • 21
  • 34
  • 1
    You got a O(n) solution, without other contstraints that pertain to the sets in the list I doubt you can get faster then that. If you got lots of similarity you might gain something if `c` is a set as well - depends on what you need it for – Patrick Artner Jul 28 '18 at 08:26
  • I was thinking more about how to avoid a python "for" loop, if this was a C++ loop it'd be prob quite ok. I even toyed with mapping the sets in bool numpy matrixes and then doing element-wise multiplications, but the matrixes become large and sparse so that was a no go – Pythonic Jul 28 '18 at 08:28
  • 3
    Reading all the data takes `len(a) + sum(len(x) for x in b)`. Computing intersection between `a` and each set in `b` takes `sum(min(len(a),len(x)) for x in b)` on average (since Python set are hash set). So **it's impossible to do faster**. [Reference](https://stackoverflow.com/questions/8102478/intersection-complexity). – user202729 Jul 28 '18 at 08:37
  • @user202729 *unless* you can skip parsing it from a list of sets into a hypothetical faster datastructure, and just build the data directly into whatever that is from the first time you source it, and use that throughout the code. But short of that unlikely event, just doing the list comprehension is almost certainly a win for code clarity as well. – lvc Jul 28 '18 at 08:51
  • Yes that was my broad idea - I tried to represent the set list in np, and I tried to represent it as a single set, but the problem is still escaping me.. – Pythonic Jul 28 '18 at 08:53
  • If numpy is an option you could try this: https://stackoverflow.com/a/8317403/7919597 – Joe Jul 28 '18 at 08:55
  • Thanks will compare the comprehension with np and see what I get (small probl is that the sets are not all the same size) – Pythonic Jul 28 '18 at 08:58
  • Did you ever get a solution to your problem? – Benedictanjw Jul 21 '22 at 03:17

0 Answers0