2
for x,y in words:
    for z in x:
        if z in stopwords:
            del x[x.index(z)]

This is my code. The data in words is a list of Tuples where a tuple looks like this:

(list of words, metadata)

The purpose of my code is to remove all the stopwords from the list of words. The only problem with this is, that the stopwords are not removed afterwards...

What exactly did I do wrong? I already tried to do it with

x.pop(x.index(z))

but that doesn't seem to make a difference.

doofesohr
  • 177
  • 1
  • 11
  • 3
    Removing data from a list while iterating through is not a good idea and will most likely produce undefined behaviour. Instead I would try to formulate your problem as a list comprehension and create a new list that conforms to your criteria. – Thomas Kühn Aug 07 '17 at 10:08
  • Please give an example of words and stopwords – nacho Aug 07 '17 at 10:08

2 Answers2

4

You could simply create a new list without the stop words using a nested list comprehension:

stopwords = set(stopwords)  # just so "in" checks are faster
result = [([word for word in x if word not in stopwords], y) for x, y in words]

For example:

>>> stopwords = ['stop']
>>> words = [(['hello', 'you', 'stop'], 'somemeta')]
>>> stopwords = set(stopwords)  # just so "in" checks are faster
>>> result = [([word for word in x if word not in stopwords], y) for x, y in words]
>>> result
[(['hello', 'you'], 'somemeta')]

Note that you generally shouldn't modify the list you're iterating over. That could lead to a lot of hard to track down bugs.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • would you mind explainng why you create a set of stopwords? I dont understand the comment sorry – DrBwts Aug 07 '17 at 10:12
  • 2
    The (average) asymptotic runtime of membership-testing is `O(1)` for sets - for other containers like lists and tuples it's `O(n)` (see also https://wiki.python.org/moin/TimeComplexity). And especially because the `in` check is done in the inner loop the potential savings could be huge. – MSeifert Aug 07 '17 at 10:14
0
for x,y in words:
    for z in x:
        if z in stopwords:
            del x[x.index(z)]

The outermost loop assigns x to one of your word lists. We'll ignore y for the moment. The second loop iterates over that word list; removing elements from a list you're iterating over causes peculiar behaviour. It is likely to skip particular words. This applies to all of del, pop, remove and slice replacement.

It would be more efficient to ensure stopwords is a set and filter each word based on it: x[:] = [w for w in x if w not in stopwords] instead of that inner loop. The slice replacement here is purely to ensure x remains the same object, in this case ensuring the entry within words changes. This doesn't run into the mentioned iteration problem because the list comprehension builds its list before the assignment stores it into the slice.

Yann Vernier
  • 15,414
  • 2
  • 28
  • 26