Here is some timing experiments:
import numpy as np
import itertools
for r in [10,100,1000,10000]:
A = list(np.random.randint(r, size=1000000))
B = list(np.random.randint(r, size=1000000))
%timeit set(A).update(B)
%timeit set(A+B)
%timeit set(itertools.chain(A, B))
print('---')
Here is the results for size = 1000
:
10000 loops, best of 3: 87.2 µs per loop
10000 loops, best of 3: 87.3 µs per loop
10000 loops, best of 3: 90.7 µs per loop
---
10000 loops, best of 3: 88.2 µs per loop
10000 loops, best of 3: 86.8 µs per loop
10000 loops, best of 3: 89.4 µs per loop
---
10000 loops, best of 3: 80.9 µs per loop
10000 loops, best of 3: 84.5 µs per loop
10000 loops, best of 3: 87 µs per loop
---
10000 loops, best of 3: 97.4 µs per loop
10000 loops, best of 3: 102 µs per loop
10000 loops, best of 3: 107 µs per loop
Here is the results for size = 1000000
:
10 loops, best of 3: 89 ms per loop
10 loops, best of 3: 106 ms per loop
10 loops, best of 3: 98.4 ms per loop
---
10 loops, best of 3: 89.1 ms per loop
10 loops, best of 3: 110 ms per loop
10 loops, best of 3: 94.2 ms per loop
---
10 loops, best of 3: 94.9 ms per loop
10 loops, best of 3: 109 ms per loop
10 loops, best of 3: 105 ms per loop
---
10 loops, best of 3: 115 ms per loop
10 loops, best of 3: 143 ms per loop
10 loops, best of 3: 138 ms per loop
So, update()
seems to be slightly faster than both other methods. However, I don't think that the time difference is significant.