Removing duplicate lists within another list

Question

t = [[a, b], [c, d], [a, e], [f, g], [c, d]]

How can I get the a unique list of lists, so that the output equals:

output = [[a, b], [c, d], [a, e], [f, g]]

[c,d] is present twice so it needs to be removed. [a,b] and [a,e] are unique lists regardless of the duplicated 'a'.

Thanks!

What are `b` and `c`? Are they simple values or another `list`, `dict`? — Sanjay T. Sharma, Feb 17 '16 at 11:26
Possible duplicate of [How do you remove duplicates from a list in Python whilst preserving order?](http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order), [Python removing duplicates in lists](http://stackoverflow.com/questions/7961363/python-removing-duplicates-in-lists) — GingerPlusPlus, Feb 17 '16 at 11:33
List is just an object, removing duplicates from list of lists is no different from list of any other objects. — GingerPlusPlus, Feb 17 '16 at 11:37

Padraic Cunningham · Accepted Answer · 2016-02-17T11:49:49.077

An OrderedDict will keep the order and give you unique elements once we map the sublists to tuples to make them hashable, using t[:] wil allow us to mutate the original object/list.

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]

from collections import OrderedDict

t[:] = map(list, OrderedDict.fromkeys(map(tuple, t)))

print(t)
[['a', 'b'], ['c', 'd'], ['a', 'e'], ['g', 'f']]

For python2 you can use itertools.imap if you want to avoid creating intermediary lists:

from collections import OrderedDict
from itertools import imap

t[:] = imap(list, OrderedDict.fromkeys(imap(tuple, t)))

print(t)

You can also use the set.add or logic:

st = set()

t[:] = (st.add(tuple(sub)) or sub for sub in t if tuple(sub) not in st)

print(t)

Which would be the fastest approach:

In [9]: t = [[randint(1,1000),randint(1,1000)] for _ in range(10000)]

In [10]: %%timeit                                                     
st = set()
[st.add(tuple(sub)) or sub for sub in t if tuple(sub) not in st]
   ....: 
100 loops, best of 3: 5.8 ms per loop

In [11]: timeit list(map(list, OrderedDict.fromkeys(map(tuple, t))))  
10 loops, best of 3: 24.1 ms per loop

Also if ["a","e"] is considered the same as ["e","a"] you can use a frozenset:

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"], ["e","a"]]
st = set()
t[:] = (st.add(frozenset(sub)) or sub for sub in t if frozenset(sub) not in st)

print(t)

Output:

[['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g']]

To avoid two calls to tuple you can make a function:

def unique(l):
    st, it = set(), iter(l)
    for tup in map(tuple, l):
        if tup not in st:
            yield next(it)
        else:
            next(it)
        st.add(tup)

Which runs a little faster:

In [21]: timeit list(unique(t))
100 loops, best of 3: 5.06 ms per loop

score 2 · Answer 2 · answered Feb 17 '16 at 11:44

2

A simple solution

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]
output = []

for elem in t:
    if not elem in output:
        output.append(elem)

print output

Output

[['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g']]

answered Feb 17 '16 at 11:44

Ajit Vaze

2,686
2
20
24

score 0 · Answer 3 · answered Feb 17 '16 at 11:28

0

You could do that using set (if the order of inner lists doesn't matter):

>>> t = [['a', 'b'], ['c', 'd'], ['a', 'e'], ['f', 'g'], ['c', 'd']]
>>> as_tuples = [tuple(l) for l in t]
>>> set(as_tuples)
{('a', 'b'), ('a', 'e'), ('c', 'd'), ('f', 'g')}

answered Feb 17 '16 at 11:28

matino

17,199
8
49
58

score 0 · Answer 4 · answered Feb 17 '16 at 11:33

A simple approach assuming you don't want to create new lists and minimize allocations.

# Assumption; nested_lst contains only lists with simple values (floats, int, bool)
def squashDups( nested_lst ):
    ref_set = set()
    new_nested_lst = []
    for lst in nested_lst:
        tup = tuple(lst)
        if tup not in ref_set:
            new_nested_lst.append(lst)
            ref_set.add(tup)
    return new_nested_lst

>>> lst = [ [1,2], [3,4], [3,4], [1,2], [True,False], [False,True], [True,False] ]
>>> squashDups(lst)
[[1, 2], [3, 4], [True, False], [False, True]]

score -1 · Answer 5 · answered Feb 17 '16 at 11:27

-1

If you do care about the order, this should work:

t = [["a", "b"], ["c", "d"], ["a", "e"], ["f", "g"], ["c", "d"]]
i = len(t) - 1
while i >= 0:
    if t.count(t[i]) > 1:
        t.pop(i)
    i -= 1
print(t)

answered Feb 17 '16 at 11:27

tjohnson

1,047
1
11
18

Not sure why, your answer worked as well! – Tom Feb 17 '16 at 12:21

Removing duplicate lists within another list

5 Answers5