21

I'm doing a set difference operation in Python:

x = [1, 5, 3, 4]
y = [3]

result = list(set(x) - set(y))
print(result)

I'm getting:

[1, 4, 5]

As you can see, the order of the list elements has changed. How can I retain the list x in original format?

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
Avinash
  • 211
  • 1
  • 2
  • 3

3 Answers3

25

It looks like you need an ordered set instead of a regular set.

>>> x = [1, 5, 3, 4]
>>> y = [3]
>>> print(list(OrderedSet(x) - OrderedSet(y)))
[1, 5, 4]

Python doesn't come with an ordered set, but it is easy to make one:

import collections.abc

class OrderedSet(collections.abc.Set):
    def __init__(self, iterable=()):
        self.d = collections.OrderedDict.fromkeys(iterable)

    def __len__(self):
        return len(self.d)

    def __contains__(self, element):
        return element in self.d

    def __iter__(self):
        return iter(self.d)

Hope this helps :-)

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
21

Sets are unordered, so you will need to put the results back in the correct order after doing your set difference. Fortunately you already have the elements in the order you want, so this is easy.

diff = set(x) - set(y)
result = [o for o in x if o in diff]

But this can be streamlined; you can do the difference as part of the list comprehension (though it is arguably slightly less clear that that's what you're doing).

sety = set(y)
result = [o for o in x if o not in sety]

You could even do it without creating the set from y, but the set will provide fast membership tests, which will save you significant time if either list is large.

kindall
  • 178,883
  • 35
  • 278
  • 309
11

You could just do this

diff = set(x) - set(y)
[item for item in x if item in diff]

or

filter(diff.__contains__, x)
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • And if you do it with a large number of elements in `y` or lots of times, working on `set(y)` rather than `y` may be faster. – Chris Morgan Apr 04 '12 at 05:36
  • Alright, i wasn't sure about the speed but if you are sure about it then i guess that is best. – jamylak Apr 04 '12 at 05:39
  • Why would you ever go for the `filter` version? Not Pythonic. Also the `list()` wrapping would make it less efficient on Python 2 where a list is already returned by `filter()`. – Chris Morgan Apr 04 '12 at 05:41
  • Oh right, forgot i was using python 3... ill change it back. I thought it would be useful in python 3 as a generator. Also why is filter not pythonic? – jamylak Apr 04 '12 at 05:42
  • Except that you're then immediately putting it in a list, so the list comprehension is just as good. `filter` and the other functions like it are taken from functional programming; for general programming in Python they are considered to be less friendly than list comprehensions or generator expressions. There was even discussion about removing them altogether in Python 3. – Chris Morgan Apr 04 '12 at 05:43
  • Ya that was just a mixup with python 3, i forgot i should use python 2 when answering SO questions, just had it as an option for the generator... Less friendly... hmm... well they weren't removed and python has a bunch of useful things that make writing the code easier for small things like map. If they are still in python we can still use them. – jamylak Apr 04 '12 at 05:45
  • Updated mine to operate on the result of set. – jamylak Apr 04 '12 at 05:50
  • Your filter would be a little smoother if you did `filter(diff.__contains__, x)`. Of course, it's equivalent to do `set_y = set(y); filter(lambda item: item not in set_y, x)` as well. (Too bad Python doesn't have a `not` that lifts to functions or the "opposite" of `filter`, or that'd be nicer.) – Danica Apr 04 '12 at 06:02
  • That would work if you reversed the set operation. But isn't it considered bad practise to use private methods? – jamylak Apr 04 '12 at 06:09