217

Assume that S and T are assigned sets. Without using the join operator |, how can I find the union of the two sets? This, for example, finds the intersection:

S = {1, 2, 3, 4}
T = {3, 4, 5, 6}
S_intersect_T = { i for i in S if i in T }

So how can I find the union of two sets in one line without using |?

arshajii
  • 127,459
  • 24
  • 238
  • 287
fandyst
  • 2,740
  • 2
  • 14
  • 15

9 Answers9

372

You can use union method for sets: set.union(other_set)

Note that it returns a new set i.e it doesn't modify itself.

Rishabh Agrahari
  • 3,447
  • 2
  • 21
  • 22
ovrwngtvity
  • 4,261
  • 3
  • 15
  • 20
  • 72
    However, `|` can modify the variable inline: `set_a |= set_b` – jorgenkg Feb 17 '16 at 19:13
  • 14
    @jorgenkg same as: `set_a = set_a.union(set_b)`. If you mean "in-place", neither will do that, both create a new `set` – nitely Nov 10 '16 at 02:22
  • 1
    @nitely no, using `|=` will modify the variable `a` inline – jorgenkg Nov 10 '16 at 18:45
  • 3
    @jorgenkg it still creates a new set and replaces the reference. – Alvaro Jan 25 '17 at 21:12
  • 3
    @Alvaro @nitely according to a simple test: `a = set((1, 2, 3,)); b = set((1, 3, 4,)); id_a = id(a); a |= b; assert id_a == id(a)`, @jorgenkg is right - variable `a` is modified inline. Am I missing something? – johndodo Jan 07 '18 at 10:06
  • @johndodo what you are missing is that when the single line is executed, in general two things happen: 1. the set represented by `a` is destroyed, and 2. the variable `a` then refers to the newly created set in its place. in THIS case the old and new set have the same id because they are identical and the set is very simple, so python efficiently kept the old alive internally for use later. – Rick Jan 18 '18 at 15:08
  • 4
    Nope, doesn't look like it: `a = set((1, 2, 3,)); b = set((1, 3, 4,)); c = a; a |= b; assert id(c) == id(a)`. Even if `a` was destroyed, `c` wouldn't have been. Also, `c` is now `set([1, 2, 3, 4])`, so @jorgenkg's comment is correct. – johndodo Jan 19 '18 at 10:07
  • Note that `set.union` can be called statically: `new_set = set.union(first_set, second_set)`, for example, or `set.union(*collection_of_sets)`. – kungphu Jul 05 '18 at 07:06
  • `set_both = set_a.union(set_b)` would be a clearer example, and would suggest that the [union method](https://docs.python.org/3/library/stdtypes.html#set) creates a new set but does not mutate the instance in place. The example given could never work, because `set` is a [built-in](https://docs.python.org/3/library/functions.html). – Bob Stein Jan 21 '21 at 17:23
61

You could use or_ alias:

>>> from operator import or_
>>> from functools import reduce # python3 required
>>> reduce(or_, [{1, 2, 3, 4}, {3, 4, 5, 6}])
set([1, 2, 3, 4, 5, 6])
Tarek Kalaji
  • 2,149
  • 27
  • 30
Alexander Klimenko
  • 2,252
  • 1
  • 18
  • 20
56

If you are fine with modifying the original set (which you may want to do in some cases), you can use set.update():

S.update(T)

The return value is None, but S will be updated to be the union of the original S and T.

Max Candocia
  • 4,294
  • 35
  • 58
30

Assuming you also can't use s.union(t), which is equivalent to s | t, you could try

>>> from itertools import chain
>>> set(chain(s,t))
set([1, 2, 3, 4, 5, 6])

Or, if you want a comprehension,

>>> {i for j in (s,t) for i in j}
set([1, 2, 3, 4, 5, 6])
arshajii
  • 127,459
  • 24
  • 238
  • 287
23

You can just unpack both sets into one like this:

>>> set_1 = {1, 2, 3, 4}
>>> set_2 = {3, 4, 5, 6}
>>> union = {*set_1, *set_2}
>>> union
{1, 2, 3, 4, 5, 6}

The * unpacks the set. Unpacking is where an iterable (e.g. a set or list) is represented as every item it yields. This means the above example simplifies to {1, 2, 3, 4, 3, 4, 5, 6} which then simplifies to {1, 2, 3, 4, 5, 6} because the set can only contain unique items.

Asclepius
  • 57,944
  • 17
  • 167
  • 143
Jamie Saunders
  • 331
  • 2
  • 3
  • 2
    What does the `*` do in line 3? – altabq Apr 14 '20 at 14:32
  • @altabq He answers what starred expressions are in the answer. Also try playing with it in the REPL. See what `print(set_1)` vs. `print(*set_1)` looks like. Also this may give you more info: https://stackoverflow.com/questions/12555627/python-3-starred-expression-to-unpack-a-list – Aaron Bell Jan 01 '21 at 02:52
  • Thanks @aaron-bell, the answer was edited after I posted my comment to include the explanation. – altabq Jan 12 '21 at 16:34
18

If by join you mean union, try this:

set(list(s) + list(t))

It's a bit of a hack, but I can't think of a better one liner to do it.

Alois Mahdal
  • 10,763
  • 7
  • 51
  • 69
BenjaminCohen
  • 276
  • 2
  • 9
14

Suppose you have 2 lists

 A = [1,2,3,4]
 B = [3,4,5,6]

so you can find A Union B as follow

 union = set(A).union(set(B))

also if you want to find intersection and non-intersection you do that as follow

 intersection = set(A).intersection(set(B))
 non_intersection = union - intersection
iyogeshjoshi
  • 520
  • 1
  • 7
  • 15
6

You can do union or simple list comprehension

[A.add(_) for _ in B]

A would have all the elements of B

Vaibhav Mishra
  • 11,384
  • 12
  • 45
  • 58
4

If you want to join n sets, the best performance seems to be from set().union(*list_of_sets), which will return a new set.

Thus, the usage might be:

s1 = {1, 2, 3}
s2 = {2, 3, 4}
s3 = {4, 5, 6}

s1.union(s2, s3) # returns a new set
# Out: {1, 2, 3, 4, 5, 6}
s1.update(s2, s3) # updates inplace

Adding to Alexander Klimenko's answer above, I did some simple testing as shown below. I believe the main takeaway is that it seems like the more random the sets are, the bigger the difference on performance.

from random import randint

n = 100

generate_equal = lambda: set(range(10_000))
generate_random = lambda: {randint(0, 100_000) for _ in range(10_000)}

for l in [
    [generate_equal() for _ in range(n)],
    [generate_random() for _ in range(n)]
]:
    %timeit set().union(*l)
    %timeit reduce(or_, l)
Out:
  # equal sets: 69.5 / 23.6 =~ 3
  23.6 ms ± 658 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  69.5 ms ± 2.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  # random sets: 438 / 78.7 =~ 5.6
  78.7 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  438 ms ± 20.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Therefore, if you want to update inplace, the best performance comes from set.update method, as, performance wise, s1.update(s2, s3) = set().union(s2, s3).

Felipe Whitaker
  • 470
  • 3
  • 9