1

I'm trying to figure out how to delete duplicates from 2D list. Let's say for example:

x= [[1,2], [3,2]]

I want the result:

[1, 2, 3]

in this order.

Actually I don't understand why my code doesn't do that :

def removeDuplicates(listNumbers):
    finalList=[]
    finalList=[number for numbers in listNumbers for number in numbers if number not in finalList]
    return finalList

If I should write it in nested for-loop form it'd look same

def removeDuplicates(listNumbers):
    finalList=[]
    for numbers in listNumbers:
        for number in numbers:
            if number not in finalList:
                finalList.append(number)
    return finalList

"Problem" is that this code runs perfectly. Second problem is that order is important. Thanks

Iron Fist
  • 10,739
  • 2
  • 18
  • 34
H. Hasin
  • 177
  • 1
  • 2
  • 12

4 Answers4

3

finalList is always an empty list on your list-comprehension even though you think it's appending during that to it, which is not the same exact case as the second code (double for loop).

What I would do instead, is use set:

>>> set(i for sub_l in x for i in sub_l)
{1, 2, 3}

EDIT: Otherway, if order matters and approaching your try:

>>> final_list = []
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> list(filter(lambda x: f.append(x) if x not in final_list else None, x_flat))
[] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]

Or

>>> list(map(lambda x: final_list.append(x) if x not in final_list else None, x_flat))
[None, None, None, None] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]

EDIT2: As mentioned by timgeb, obviously the map & filter will throw away lists that are at the end useless and worse than that, they consume memory. So, I would go with the nested for loop as you did in your last code example, but if you want it with the list comprehension approach than:

>>> x_flat = [i for sub_l in x for i in sub_l]
>>> final_list = []
>>> for number in x_flat:
        if number not in final_list:
            finalList.append(number)
Community
  • 1
  • 1
Iron Fist
  • 10,739
  • 2
  • 18
  • 34
  • @timgeb...correct...I didn't see that in OP's requirement – Iron Fist Mar 12 '16 at 19:17
  • @H.Hasin .. is there a reason as why you don't want to go with the `nested for` loop? – Iron Fist Mar 12 '16 at 19:29
  • 1
    You should probably mention that list comprehensions / filters / maps with side effects and/or which consume memory by creating useless throwaway lists might get you smacked. :> – timgeb Mar 12 '16 at 19:39
  • @timgeb...indeed...I can't think of a simpler way than that?..may be I'm overlooking simple solutions here...any suggestions to improve my answer? – Iron Fist Mar 12 '16 at 21:09
  • 1
    @IronFist The simple solution is to flatten first, then filter, instead of insisting on the oneliner. Your second edit is similar to mine, but yours has quadratic runtime. There's basically three O(n) solutions for removing duplicates while keeping the order you see over and over. The first one is the one I posted. The second one is the list comprehension from the accepted answer in the link in my answer - which has side effects which are arguably not a good thing in production code. The third one is to abuse an OrderedDict as an ordered set (i.e. `OrderedDict.fromkeys(iterable).keys()`). – timgeb Mar 13 '16 at 11:33
  • 1
    Another suggestion: You could make `x_flat` a generator comprehension instead of a list comprehension to save memory. – timgeb Mar 13 '16 at 11:39
1

You declare finalList as the empty list first, so

if number not in finalList

will be False all the time.

The right hand side of your comprehension will be evaluated before the assignment takes place.

Iterate over the iterator chain.from_iterable gives you and remove duplicates in the usual way:

>>> from itertools import chain
>>> x=[[1,2],[3,2]]
>>> 
>>> seen = set()
>>> result = []
>>> for item in chain.from_iterable(x):
...     if item not in seen:
...         result.append(item)
...         seen.add(item)
... 
>>> result
[1, 2, 3]

Further reading: How do you remove duplicates from a list in Python whilst preserving order?

edit:

You don't need the import to flatten the list, you could just use the generator

(item for sublist in x for item in sublist)

instead of chain.from_iterable(x).

Community
  • 1
  • 1
timgeb
  • 76,762
  • 20
  • 123
  • 145
1

The expression on the right-hand-side is evalueated first, before assigning the result of this list comprehension to the finalList. Whereas in your second approach you write to this list all the time between the iterations. That's the difference.

That may be similar to the considerations why the manuals warn about unexpected behaviour when writing to the iterated iterable inside a for loop.

you could use the built-in set()-method to remove duplicates (you have to do flatten() on your list before)

Ilja
  • 2,024
  • 12
  • 28
0

There is no way in Python to refer to the current comprehesion. In fact, if you remove the line finalList=[], which does nothing, you would get an error.

You can do it in two steps:

finalList = [number for numbers in listNumbers for number in numbers]
finalList = list(set(finalList))

or if you want a one-liner:

finalList = list(set(number for numbers in listNumbers for number in numbers))
Julien Spronck
  • 15,069
  • 4
  • 47
  • 55