0

I have a list like this ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']. I want to remove all of the words: and, or, of. I, therefore, come up with the following block of code

my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
print('Before: {}'.format(my_list))
my_list = list(filter(lambda a: 'and' not in a and 'of' not in a and 'or' not in a, my_list))
print('After: {}'.format(my_list))

However, my code gives the output like this

Before: ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
After: []

What I want should be

['land_transport', 'port', 'surveyor', 'organization']

There are, of course, several ways to go around. But I want to insist on using lambda function to solve this problem. Any suggestions for my problem?

Uvuvwevwevwe
  • 971
  • 14
  • 30
  • 2
    Your `'and' not in a` check is doing exactly what it says on the tin—finding any words that don't contain `and`. Since `land_transport` contains `and`, it gets filtered out. Presumably you're using `not in` instead of just `!=` for a reason, but without knowing what that reason is, it's hard to tell you how to fix things. – abarnert Jun 12 '18 at 20:43
  • Possible duplicate of [Python, compute list difference](https://stackoverflow.com/questions/6486450/python-compute-list-difference) – Camille Goudeseune Jun 12 '18 at 20:46

3 Answers3

2

You can create a new list storing all of the words to be filtered:

my_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
to_remove = ['or', 'of', 'and']
new_list = list(filter(lambda x:x not in to_remove, my_list))

Output:

['land_transport', 'port', 'surveyor', 'organization']
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
1

Your filtering is not correct use:

filter_set = {'and', 'or', 'of'}
my_list = list(filter(lambda a: a not in filter_set, my_list))

You want all the items in my_list that are not in the filter_set, notice the use of a set, it will make the lookup much faster (O(N) vs O(1)).

Netwave
  • 40,134
  • 6
  • 50
  • 93
  • Thanks for your answer, especially the part **the use of a set**. Could you refer to any documents for this point, in your answer, @Netwave? – Uvuvwevwevwe Jun 12 '18 at 21:00
1

Although above answers serve the need, I think you intend to remove stop words.

nltk is best resource in Python for that. You can use nltk.corpus.stopwords

You dont have to do much manipulation if you know you are removing the actual English stop words.

from nltk.corpus import stopwords
word_list = ['land_transport', 'and', 'or', 'port', 'of', 'surveyor', 'and', 'organization']
filtered_words = [word for word in word_list if word not in stopwords.words('english')]

print(filtered_words)

['land_transport', 'port', 'surveyor', 'organization']

Vola

Morse
  • 8,258
  • 7
  • 39
  • 64