5

Below is a simple function to remove duplicates in a list while preserving order. I've tried it and it actually works, so the problem here is my understanding. It seems to me that the second time you run uniq.remove(item) for a given item, it will return an error (KeyError or ValueError I think?) because that item has already been removed from the unique set. Is this not the case?

def unique(seq):
    uniq = set(seq)  
    return [item for item in seq if item in uniq and not uniq.remove(item)]
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 5
    @katrielalex -- I don't. Using a condition for the side-effect of removing and item from a collection leads to confusing, hard to read code. (IMHO) – mgilson Nov 02 '12 at 14:34
  • 1
    Plus you are creating an entire new `set` and popping every item from it just to act as a filter on a list. I can't imagine this is faster, and it definitely isn't clearer, than a single pass (for creating a new de-duped list) or double pass (for in-place de-duping of the list) `for` loop. – Silas Ray Nov 02 '12 at 14:40
  • For anyone interested, http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order contains various options for doing this same operation in different ways. – mgilson Nov 02 '12 at 14:49

6 Answers6

9

There's a check if item in uniq which gets executed before the item is removed. The and operator is nice in that it "short circuits". This means that if the condition on the left evaluates to False-like, then the condition on the right doesn't get evaluated -- We already know the expression can't be True-like.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Thanks very much for this. What value does uniq.remove(item) return? I'm guessing the whole "and not uniq.remove(item)" is a way to run methods in a list comprehension rather than changing the whole thing to a for loop, but I'm not sure why, for example, we use "and not" in this case instead of just "and." Presumably b/c unique.remove(item) returns None or False? – user1794459 Nov 02 '12 at 14:29
  • `uniq.remove(item)` returns `None`. `not None` returns `True`. – mgilson Nov 02 '12 at 14:32
4

set.remove is an in-place operation. This means that it does not return anything (well, it returns None); and bool(None) is False.

So your list comprehension is effectively this:

answer = []
for item in seq:
    if item in uniq and not uniq.remove(item):
        answer.append(item)

and since python does short circuiting of conditionals (as others have pointed out), this is effectively:

answer = []
for item in seq:
    if item in uniq:
        if not uniq.remove(item):
            answer.append(item)

Of course, since unique.remove(item) returns None (the bool of which is False), either both conditions are evaluated or neither.

The reason that the second condition exists is to remove item from uniq. This way, if/when you encounter item again (as a duplicate in seq), it will not be found in uniq because it was deleted from uniq the last time it was found there.

Now, keep in mind, that this is fairly dangerous as conditions that modify variables are considered bad style (imagine debugging such a conditional when you aren't fully familiar with what it does). Conditionals should really not modify the variables they check. As such, they should only read the variables, not write to them as well.

Hope this helps

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • "The main reason for the second condition ..." -> "the **only** reason for the second condition ..." :D. It might be worth pointing out that some consider it a little rude to use conditions for side-effects like this. – mgilson Nov 02 '12 at 14:30
1

mgilson and others has answered this question nicely, as usual. I thought I might point out what is probably the canonical way of doing this in python, namely using the unique_everseen recipe from the recipe section of the itertools docs, quoted below:

from itertools import ifilterfalse

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element
Lauritz V. Thaulow
  • 49,139
  • 12
  • 73
  • 92
0
def unique_with_order(seq):
    final = []
    for item in seq:
        if item not in final:
            final.append(item)
    return final


print unique_with_order([1,2,3,3,4,3,6])

Break it down, make it simple :) Not everything has to be a list comprehension these days.

Jakob Bowyer
  • 33,878
  • 8
  • 76
  • 91
  • 1
    Of course, not everything! Because we have dict comprehensions and generator comprehensions :) – Kos Nov 02 '12 at 14:21
  • 1
    I have no problem with this as a way to make a list unique -- However I don't think that it help OP with the conceptual understanding about why the expression actually works. – mgilson Nov 02 '12 at 14:29
  • 3
    I do have a problem with this way of doing it. It's O(n²). – Sven Marnach Nov 02 '12 at 14:33
  • @SvenMarnach -- Great point. I just glanced at the code and assumed that `final` was a `set` and he was appending to a different list. I suppose I should have looked at it more closely. – mgilson Nov 02 '12 at 14:37
  • @SvenMarnach -- I wonder if there's a better way to do this *if* order matters **and** the objects are unhashable. It seems like this might be the best you can do in that case ... (I googled quickly and didn't turn up anything to the contrary, although that doesn't mean a better solution doesn't exist) – mgilson Nov 02 '12 at 14:43
  • @mgilson If you can construct a hashable key from the objects you could use the `unique_everseen` recipe from my answer with a key function. – Lauritz V. Thaulow Nov 02 '12 at 14:59
  • @lazyr -- Yeah, but objects which are unhashable are usually unhashable for a reason :). Creating a hash for them on the go seems a little sketchy. – mgilson Nov 02 '12 at 15:10
  • @mgilson Isn't the reason usually that they're mutable? If that's the reason then I don't think there's normally a problem preventing changes temporarily during the uniqueifying operation (i.e. don't call any external functions while uniquefying, if single-threaded), and if so then you may just use a key that "freezes" the mutable. – Lauritz V. Thaulow Nov 02 '12 at 15:21
  • @lazyr -- fair point. I was thinking that we were comparing lists from different places, but we're not. You might be right. That might be one (sneaky) way to go about doing it. – mgilson Nov 02 '12 at 15:24
0

@mgilson's answer is the right one, but here, for your information, is a possible lazy (generator) version of the same function. This means it'll work for iterables that don't fit in memory - including infinite iterators - as long as the set of its elements will.

def unique(iterable):
    uniq = set()
    for item in iterable:
        if item not in uniq:
            uniq.add(item)
            yield item
Community
  • 1
  • 1
Benjamin Hodgson
  • 42,952
  • 15
  • 108
  • 157
-1

The first time you run this function, you will get [1,2,3,4] from your list comprehension and the set uniq will be emptied. The second time you run this function, you will get [] because your set uniq will be empty. The reason you don't get any errors on the second run is that Python's and short circuits - it sees the first clause (item in uniq) is false and doesn't bother to run the second clause.

dshapiro
  • 1,075
  • 14
  • 24
  • I'm sorry for the downvote, but this just isn't clear. What do you mean the second time you run the function you'll get `[]`? Why would the set `uniq` be empty? – mgilson Nov 02 '12 at 14:23
  • `uniq` is empty because `uniq.remove(item)` empties it. The list comprehension doesn't short circuit the first time through. I'll edit my answer to spell it out. – dshapiro Nov 02 '12 at 14:24
  • `uniq` gets rebuilt every time the function is called by the line `uniq = set(seq)` – mgilson Nov 02 '12 at 14:25
  • Oh, you are correct. And that's the answer to his question. He wanted to know why he wasn't getting a `KeyError`; it's because `uniq` is method-local and gets reset each time the method is called. – dshapiro Nov 02 '12 at 14:28
  • Daniel, thanks for this. My question wasn't phrased optimally, as I meant when you loop through a particular iterable/list, what happens when you hit the duplicate and the item has already been removed from uniq. That said, I could see why it sounds like I meant what happens when you call the method the second time around. Thanks for your input. – user1794459 Nov 02 '12 at 14:41