0

I have two lists such as the examples below (in reality, a is longer) and I would like to remove all common elements, in this case the punctuation given in list punctuation.

a = [['A', 'man,', 'view,', 'becomes', 'mankind', ';', 'mankind', 'member', 'comical', 'family', 'Intelligences', '.'],['Jeans', 'lengthen', 'legs', ',', 'hug', 'hips', ',', 'turn', 'heads', '.']]
punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]
MERose
  • 4,048
  • 7
  • 53
  • 79
  • Stackoverflow is a community where you post some of the code or things you have tried, what have you tried so far? We are glad to help, what have you tried though? – Jeff Sloyer May 04 '15 at 21:15

4 Answers4

1

Make a set of words to remove and test containment item by item if you need to preserve order.

cleaned = [word for word in words if word not in blacklist] 
jwilner
  • 6,348
  • 6
  • 35
  • 47
0

When the order is not important:

You can do a set() operation on it, but first you have to flatten the nested list a (taken from Making a flat list out of list of lists in Python):

b = [item for sublist in a for item in sublist]
cleaned = list(set(b) - set(punctuation))

cleaned is a list that looks like ['A', 'hug', 'heads', 'family', 'Intelligences', 'becomes', 'Jeans', 'lengthen', 'member', 'turn', 'mankind', 'view,', 'legs', 'man,', 'hips', 'comical']

When the order is important:

Simply a list comprehension, which is probably slower

cleaned = [x for x in b if x not in punctuation]

cleaned looks like ['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Community
  • 1
  • 1
MERose
  • 4,048
  • 7
  • 53
  • 79
0

You can do this, but the list order might change.

[list(set(sublist)-set(punctuation)) for sublist in a]

Using sets, you can remove the punctuation entries, and cast the result to a list again. Use list comprehension to do it for each sublist in the list.


If keeping the order is important, you can do this:

[[x for x in sublist if not (x in punctuation)] for sublist in a]
Bastian35022
  • 1,092
  • 1
  • 10
  • 18
0

You can do:

>>> from itertools import chain
>>> filter(lambda e: e not in punctuation, chain(*a))
['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences', 'Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']

Or, if you want to maintain you sublist structure:

>>> [filter(lambda e: e not in punctuation, sub) for sub in a]
[['A', 'man,', 'view,', 'becomes', 'mankind', 'mankind', 'member', 'comical', 'family', 'Intelligences'], ['Jeans', 'lengthen', 'legs', 'hug', 'hips', 'turn', 'heads']]
dawg
  • 98,345
  • 23
  • 131
  • 206