Delete item in a list within a list

Question

stopwords is a list of strings, tokentext is a list of lists of strings. (Each list is a sentence, the list of lists is an text document).
I am simply trying to take out all the strings in tokentext that also occur in stopwords.

for element in tokentext:
    for word in element:
        if(word.lower() in stopwords):
             element.remove(word)

print(tokentext)

I was hoping for someone to point out some fundamental flaw in the way I am iterating over the list..

Here is a data set where it fails: http://pastebin.com/p9ezh2nA

Sorry if not clear: I don't understand why my code doesn't work, and what is a solution to my question — OctaveParango, Jan 19 '15 at 04:58
@user3878398 can you please provide the traceback, if this causes an error? — Adam Smith, Jan 19 '15 at 04:59
@user3878398 Can you please show us what `tokentext` looks like? Expected output? Current output? — UltraInstinct, Jan 19 '15 at 05:00
@Thrustmaster, token text is just a list of lists of words as Adam smith recreated below. — OctaveParango, Jan 19 '15 at 05:09
@user3878398 then you need to provide a [MCVE](http://stackoverflow.com/help/mcve) so we know what's gone wrong with your code. — Adam Smith, Jan 19 '15 at 05:10
I can't replicate your results. The code in your question works for me. — Adam Smith, Jan 19 '15 at 05:21
The issue emerges if you have more than one item to be removed from the list, and they're next to each other. The second item gets skipped, as in does not get deleted (certainly in Python 2.7, I don't think that implementation has changed in Python 3.x but haven't checked). — zehnpaard, Jan 19 '15 at 05:29
Closing the question as "Questions without a clear problem statement are not useful to other readers." Your question currently does not demonstrate a problem, and the code works exactly as you expected with that set of inputs (the `?` was removed). — Antti Haapala -- Слава Україні, Jan 19 '15 at 06:15
Well the full problem I guess is in the post below in the comments, via the pastebin.. — OctaveParango, Jan 19 '15 at 06:24

Adam Smith · Accepted Answer · 2015-01-19T05:03:10.647

3

Altering a list while iterating on it will always create issues. Try instead something like:

stopwords = ["some", "strings"]
tokentext = [ ["some", "lists"], ["of", "strings"] ]

new_tokentext = [[word for word in lst if word not in stopwords] for lst in tokentext]
# creates a new list of words, filtering out from stopwords

Or using filter:

new_tokentext = [list(filter(lambda x: x not in stopwords, lst)) for lst in tokentext]
# the call to `list` here is unnecessary in Python2

edited Jan 19 '15 at 05:03

answered Jan 19 '15 at 04:58

Adam Smith

52,157
12
73
112

@user3878398 I need to know how this "doesn't seem to do it." This works in my example, so if my example differs from your setup, then I'll need to know what's going wrong to fix it – Adam Smith Jan 19 '15 at 05:07
if i knew i would tell you :) your dummie example is correct, but somehow when i run it myself on my stopwords, tokentext, it doesn't work.. I am puzzled. – OctaveParango Jan 19 '15 at 05:21
@user3878398 YOUR solution works, as well. [This is my output](http://codepad.org/EMY37hfI) – Adam Smith Jan 19 '15 at 05:24
No my solution does not work.. I am not that stupid :p. try with this http://pastebin.com/qSudHq8k – OctaveParango Jan 19 '15 at 05:36
@user3878398 `tokentext` is not a list of lists in that paste. It's a single list of strings, followed by a bunch of lists that you aren't assigning to a variable :) – Adam Smith Jan 19 '15 at 05:38
Sorry it's late i'm tired haha.. ignore line 36 and add a set of brackets from start until end of line 35.. That should be your tokentext – OctaveParango Jan 19 '15 at 05:44

score -2 · Answer 2 · answered Jan 19 '15 at 05:11

-2

You could just do something simple like:

for element in tokentext:
    if element in stop words:
        stopwords.remove(element)

It's kinda like yours, but without the extra for loop. But I am not sure if this works, or if that's what you are trying to achieve, but it's an idea, and I hope it helps!

answered Jan 19 '15 at 05:11

Chris Nguyen

160
1
4
14

This logic is backwards (you're removing words from `stopwords` rather than `element`) and if `tokentext` is a list of lists of strings, then `element` is a list of strings, so `element` will never be in `stop_words` (which is also a list of strings) – Adam Smith Jan 19 '15 at 05:13

Delete item in a list within a list

2 Answers2