Removing duplicates and blanks in list of list in Python

Question

I have a list of list a as follows:

[[u'Apple', '', u'Apple Inc', u'Apple', u'shares ', u'Amazon', u'Amazon', u'Amazon', '', '', u'Apple', u'Kindle', u'iPad', u'Amazon', u'Amazon', '', u'Amazon', u'Kindle', u'Amazon', '', u'iPad', u'iPad', u'iPad', u'Kindle', u'Kindle', u'Nook', u' ', u'sales', '', '', u'Amazon', '', '', '', '', '', ''], [u'United Kingdom', ''], [u'LA']]

I need to remove the duplicates and the blanks in this. I tried the following:

a_1 = filter(None,a)
a_2 = list(set(a_1))

This doesn't seem to work because of the error TypeError: unhashable type: 'list'. I tried to convert the list into tuples, but it also didn't work.

a_1 = set(map(tuple,a))
a_2 = map(list,a_1)

I have to preserve the order also. Can someone help me out with this.

Thanks.

possible duplicate of [Python: removing duplicates from a list of lists](http://stackoverflow.com/questions/2213923/python-removing-duplicates-from-a-list-of-lists) — mmmmmm, Dec 06 '13 at 11:46

Ashwini Chaudhary · Accepted Answer · 2013-12-06T11:53:23.053

This should do it:

>>> lis = [[u'Apple', '', u'Apple Inc', u'Apple', u'shares ', u'Amazon', u'Amazon', u'Amazon', '', '', u'Apple', u'Kindle', u'iPad', u'Amazon', u'Amazon', '', u'Amazon', u'Kindle', u'Amazon', '', u'iPad', u'iPad', u'iPad', u'Kindle', u'Kindle', u'Nook', u' ', u'sales', '', '', u'Amazon', '', '', '', '', '', ''], [u'United Kingdom', ''], [u'LA']]
def solve(lis):
    for seq in lis:
        seen = set()
        yield [x for x in seq if x.strip() and x not in seen and not seen.add(x)]

>>> list(solve(lis))
[[u'Apple', u'Apple Inc', u'shares ', u'Amazon', u'Kindle', u'iPad', u'Nook', u'sales'],
 [u'United Kingdom'],
 [u'LA']]

Change x.strip() to just if x if you don't consider u' ' an empty string.

aga · Answer 2 · 2013-12-06T12:26:15.273

You can traverse your list, making a set out of each element in it. Then you can filter the blank values via list comprehension like so:

a = [[u'Apple', '', u'Apple Inc', u'Apple', u'shares ', u'Amazon', u'Amazon', u'Amazon', '', '', u'Apple', u'Kindle', u'iPad', u'Amazon', u'Amazon', '', u'Amazon', u'Kindle', u'Amazon', '', u'iPad', u'iPad', u'iPad', u'Kindle', u'Kindle', u'Nook', u' ', u'sales', '', '', u'Amazon', '', '', '', '', '', ''], [u'United Kingdom', ''], [u'LA']]
b = [[val for val in set(inner_list) if val] for inner_list in a] # b is [[u'iPad', u'Apple', u' ', u'sales', u'Nook', u'Amazon', u'Apple Inc', u'Kindle', u'shares '], [u'United Kingdom', ''], [u'LA']]

To preserve an order you can make use of an OrderedSet which can be found here:

b = [[val for val in OrderedSet(inner_list) if val] for inner_list in a]

ndpu · Answer 3 · 2013-12-06T11:54:47.293

You can use itertools.chain.from_iterable:

>>> import itertools
>>> a1=[[u'Apple', '', u'Apple Inc', u'Apple', u'shares ', u'Amazon', u'Amazon', u'Amazon', '', '', u'Apple', u'Kindle', u'iPad', u'Amazon', u'Amazon', '', u'Amazon', u'Kindle', u'Amazon', '', u'iPad', u'iPad', u'iPad', u'Kindle', u'Kindle', u'Nook', u' ', u'sales', '', '', u'Amazon', '', '', '', '', '', ''], [u'United Kingdom', ''], [u'LA']]
>>> list(set(e for e in itertools.chain.from_iterable(a1) if e))
[u'iPad', u' ', u'Apple', u'LA', u'sales', u'Nook', u'United Kingdom', u'Amazon', u'Apple Inc', u'Kindle', u'shares ']

Removing duplicates and blanks in list of list in Python

3 Answers3