stopwords
is a list of strings, tokentext
is a list of lists of strings. (Each list is a sentence, the list of lists is an text document).
I am simply trying to take out all the strings in tokentext
that also occur in stopwords
.
for element in tokentext:
for word in element:
if(word.lower() in stopwords):
element.remove(word)
print(tokentext)
I was hoping for someone to point out some fundamental flaw in the way I am iterating over the list..
Here is a data set where it fails: http://pastebin.com/p9ezh2nA