1

How can I intersect a list and a set to preserve the order of the list? Simple example:

k=[1,2,3,4]
d={3,2}
d.intersection(k)
[2,3]#this is the ideal result

edit: speed is the most important factor here

mikeLundquist
  • 769
  • 1
  • 12
  • 26

1 Answers1

7

You'll have to use a list comprehension, which lets you keep the order by filtering:

[i for i in k if i in d]

Demo:

>>> k = [1, 2, 3, 4]
>>> d = {2, 3}
>>> [i for i in k if i in d]
[2, 3]

For Python 3, this is the fastest option available; you could use list(filter(d.__contains__, l)) but this is slower on this trivially small example:

>>> from timeit import timeit
>>> def listcomp(k, d):
...     return [i for i in k if i in d]
...
>>> def filtered(k, d):
...     return list(filter(d.__contains__, k))
...
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d')
0.49590064199946937
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d')
0.6533352420010488

and timings worsen when the data sets grow in size:

>>> import random
>>> k = sorted([random.randrange(1000) for _ in range(1000)])
>>> d = {random.randrange(1000) for _ in range(100)}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d', number=10000)
0.30027976899873465
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d', number=10000)
0.4524774450001132

In Python 2, filter() is the faster option, given large enough inputs:

>>> from timeit import timeit
>>> import random
>>> def listcomp(k, d):
...     return [i for i in k if i in d]
...
>>> def filtered(k, d):
...     return filter(d.__contains__, k)
...
>>> k = [1, 2, 3, 4]
>>> d = {2, 3}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d')
0.4015800952911377
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d')
0.4407978057861328
>>> k = sorted([random.randrange(1000) for _ in range(1000)])
>>> d = {random.randrange(1000) for _ in range(100)}
>>> timeit('listcomp(k, d)', 'from __main__ import listcomp, k, d', number=10000)
0.4594550132751465
>>> timeit('filtered(k, d)', 'from __main__ import filtered, k, d', number=10000)
0.28088998794555664

I'm not aware of any faster options; the available ordered set implementations are all Pure-python solutions, and slower still.

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343