Removing stopwords from list of lists

Question

I would like to know how I can remove specific words, including stopwords, from a list of list like this:

my_list=[[],
 [],
 ['A'],
 ['SB'],
 [],
 ['NMR'],
 [],
 ['ISSN'],
 [],
 [],
 [],
 ['OF', 'USA'],
 [],
 ['THE'],
 ['HOME'],
 [],
 [],
 ['STAR'],
 []]

If it was a list of strings, I would have applied something like the following:

from collections import Counter
stop_words = stopwords.words('english')
text = ' '.join([word for word in my_list if word not in stop_words])

I would need to plot it at the end doing something like this:

counts= Counter(chain.from_iterable(my_list))
plt.bar(*zip(*counts.most_common(20)))
plt.show()

Expected list to be plotted:

my_list=[[],
 [],
 ['SB'],
 [],
 ['NMR'],
 [],
 ['ISSN'],
 [],
 [],
 [],
 ['USA'],
 [],
 ['HOME'],
 [],
 [],
 ['STAR'],
 []]

So what's the expected output? And what does the Counter have to do with removing words? — Mureinik, Dec 22 '20 at 18:23
I removed the counter from the list. Now probably it should be better. I added an example of output (for the list) that I would like to have in order to plot it — V_sqrt, Dec 22 '20 at 18:28

Barmar · Accepted Answer · 2020-12-22T18:37:15.850

3

Loop through my_words, replacing each nested list with the list with stop words removed. You can use set difference to remove the words.

stop_words = stopwords.words('english')
my_list = [list(set(sublist).difference(stop_words)) for sublist in my_list]

It gets a little more complicated to do the comparisons case insensitively, as you can't use the built-in set difference method.

my_list = [[word for word in sublist if word.lower() not in stop_words] for sublist in my_list]

edited Dec 22 '20 at 18:37

answered Dec 22 '20 at 18:29

Barmar

Thanks Barmar. I have tried it, but there are still some stopwords like the, of in my_list, so when I plot I still see them. – V_sqrt Dec 22 '20 at 18:33
It's probably because the stop words are lowercase, but your list contains uppercase words. – Barmar Dec 22 '20 at 18:34
Can you change `my_list` to be all lowercase? If not, I showed how to do the comparison after converting case. – Barmar Dec 22 '20 at 18:38
it makes sense. Thanks Barmar! – V_sqrt Dec 22 '20 at 18:38

1 Answers1