3

I am reading an article about python removing duplicate element in a list. there is a function defined as:

def f8(seq): # Dave Kirby
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

However, i don't really understand the syntax for [x for x in seq if x not in seen and not seen.add(x)]

what is this syntax ? how do I read it?

thank you.

Imran
  • 87,203
  • 23
  • 98
  • 131
Chung
  • 1,063
  • 1
  • 13
  • 21
  • 1
    This is a *list comprehension*. If you Google the term, there are plenty of tutorials around, e.g. http://carlgroner.me/Python/2011/11/09/An-Introduction-to-List-Comprehensions-in-Python.html – NPE Jun 18 '13 at 06:32

3 Answers3

3

It is called a list comprehension, they provide a syntactically more compact and more efficient way of writing a normal for-loop based solution.

def f8(seq): # Dave Kirby
    # Order preserving
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

The above list comprehension is roughly equivalent to:

def f8(seq):
    seen = set()
    lis =[]
    for x in seq:
        if x not in seen:
            lis.append(x)
            seen.add(x)
    return lis
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
3

The construct is called a list comprehension [x for x in seq if some_condition]. In this case the condition is that x isn't already in the resulting list. You can't inspect the result of a list comprehension while you are running it, so it keeps track of the items that are in there using a set called seen

This condition here is a bit tricky because it relies on a side-effect

not in seen and not seen.add(x)

seen.add() always returns None. If the item is in seen,

not in seen is False, so the and shortcircuits.

If the item is not in seen,

not in seen is True and not seen.add(x) is also True, so the item is included, and as a side-effect, it is added to the seen set

While, this type of thing can be fun, it's not a particularly clear way to express the intent.

I think the less tricky way is much more readable

def f8(seq):
    seen = set()
    result = []
    for x in seq:
        if x not in seen:
            result.append(x)
            seen.add(x)
    return result
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
3

Firstly list comprehensions are usually easy to read, here is a simple example:

[x for x in seq if x != 2]

translates to:

result = []
for x in seq:
    if x != 2:
        result.append(x)

The reason why you can't read this code is because it is not readable and hacky code as I stated in this question:

def f8(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

translates to:

def f8(seq):
    seen = set()
    result = []
    for x in seq:
        if x not in seen and not seen.add(x): # not seen.add(...) always True
            result.append(x)

and relies on the fact that set.add is an in-place method that always returns None so not None evaluates to True.

>>> s = set()
>>> y = s.add(1) # methods usually return None
>>> print s, y
set([1]) None

The reason why the code has been written this way is to sneakily take advantage of Python's list comprehension speed optimizations.

Python methods will usually return None if they modify the data structure (pop is one of the exceptions)

I also noted that the current accepted way of doing this (2.7+) which is more readable and doesn't utilize a hack is as follows:

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

Dictionary keys must be unique, therefore the duplicates are filtered out.

Community
  • 1
  • 1
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • does the new method has the same efficiency as the hacky one? – Chung Jun 18 '13 at 07:10
  • @user2213606 efficiency, yes, I believe it is not as fast in raw speed though. It's more succinct however – jamylak Jun 18 '13 at 07:40
  • `list(OrderedDict.fromkeys(items))` is portable to Python3 – John La Rooy Jun 18 '13 at 09:42
  • @gnibbler good point, for some reason i was under the assumption that it would return a list since it keeps order but this `KeysView` also does – jamylak Jun 18 '13 at 10:34
  • i tried to test the speed of the function mentioned above, the speed was terribly slow, script can be found here, function was def f12 http://pastebin.com/Q1EXhqsi – Chung Jun 19 '13 at 02:36
  • @user2213606 haven't yet looked at the timings but I did say it wouldn't be as fast in raw speed. It is still an O(N) efficient solution though – jamylak Jun 19 '13 at 03:04