1

I have a list as below:

lst = ['for Sam', 'Just in', 'Mark Rich']

I am trying to remove an element from list of strings(string contains one or more words) which contains stopwords.

As 1st and 2nd elements in the list contains for and in which are stopwords, it will return

new_lst = ['Mark Rich'] 

What I tried

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split(" ") for i in lst]
new_lst = [" ".join(i) for i in new_lst for j in i if j not in stop_words]

Which is giving me output as:

['for Sam', 'Just in', 'Mark Rich', 'Mark Rich']
Sociopath
  • 13,068
  • 19
  • 47
  • 75

2 Answers2

1

You need an if statement rather than extra nesting:

new_lst = [' '.join(i) for i in new_lst if not any(j in i for j in stop_words)]

If you wish to utilize set, you can use set.isdisjoint:

new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

Here's a demonstration:

stop_words = {'for', 'in'}

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split() for i in lst]
new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

print(new_lst)

# ['Mark Rich']
jpp
  • 159,742
  • 34
  • 281
  • 339
1

You can use a list comprehension and use sets to check if any words within the two lists intersect:

[i for i in lst if not set(stop_words) & set(i.split(' '))]
['Mark Rich']]
yatu
  • 86,083
  • 12
  • 84
  • 139
  • thanks. Worked like a charm. Only one thing, in your answer you misplaced `]` – Sociopath Jan 02 '19 at 11:34
  • Note `set.intersection` has higher complexity vs `set.disjoint`. It's not necessary to calculate the exact intersection of the 2 sets to know if the intersection is empty. – jpp Jan 02 '19 at 11:35
  • 1
    Yes I actually thought about `.isdisjoint` right when i saw your answer. thx for clarifying – yatu Jan 02 '19 at 11:37