1

I have a list of strings. I'm trying to remove elements that have certain substrings. If I use the following, only elements containing 'Cantab' are removed:

for line in merged:
    if 'Duke' in line and 'Sir' not in line or 'Cantab' in line or 'Rick' in line:
        merged.remove(line)           

If I try to break up the conditional, my target results aren't achieved, but I successfully remove elements that contain Duke, but not Sir Duke.:

if 'Duke' in line and 'Sir' not in line:
    merged.remove(line)

This works as expected:

if 'Duke' in line and 'Sir' not in line:
    merged.remove(line)
elif 'Cantab' in line:
    merged.remove(line)

But the following only removes elements containing 'Cantab'!!!:

if 'Duke' in line and 'Sir' not in line:
    merged.remove(line)
elif 'Cantab' in line:
    merged.remove(line)
elif 'Rick' in line:
    merged.remove(line)

I'm having trouble figuring out the logic here. Thanks!

The_Glidd
  • 75
  • 3
  • 8
  • You are using a single `if statement` which means that if one of the branches is taken, then the rest won't be considered. Instead, try separating each condition into a separate `if statement`. i.e. replace `elif`s with `if`. See if that helps. – s16h May 19 '14 at 21:39

1 Answers1

4

Generally, do not iterate over a list and remove items from the list at the same time. It shifts the internal index of the items yet-to-be-iterated-over, so that not all items are iterated over.

You can fix the problem by looping over a copy of merged:

for line in list(merged):

This example illustrates the problem:

merged = ['Cantab', 'Duke', 'Cantab', 'Duke', 'Cantab', 'Duke']
for line in merged:
    print(line)
    if 'Duke' in line and 'Sir' not in line or 'Cantab' in line or 'Rick' in line:
        merged.remove(line)           
print(merged)

prints

Cantab  # only the Cantab lines are being iterated over!
Cantab
Cantab
['Duke', 'Duke', 'Duke']

Consider the first iteration of the loop. When line equals Cantab, the first Duke has internal index of 1. But after Cantab is removed, the first Duke's internal index becomes 0. But Python advances the loop index to 1! Now the second Cantab is at index 1, so the first Duke is completely skipped.


Alternately, you can fix the problem without making a copy of merged by iterating over the list backwards. Done this way, deleting the current item in merged is safe because the position of the yet-to-be-iterated-over items is not modified:

merged = ['Cantab', 'Duke']*3
for i in range(len(merged)-1, -1, -1):
    line = merged[i]
    if 'Duke' in line and 'Sir' not in line or 'Cantab' in line or 'Rick' in line:
        del merged[i] 
print(merged)

prints

[]

as desired.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Sneaky! Is there a more obvious way of creating a copy to loop over rather than taking a full slice? – Tom Fenech May 19 '14 at 21:46
  • 1
    Alternatively, iterate over the loop *backwards*. Then it is safe to remove the last item. – unutbu May 19 '14 at 21:47
  • Thanks. A silly mistake that I've made before. Thanks for the quick solution. – The_Glidd May 19 '14 at 21:49
  • @TomFenech: `list(merged)` will also return a shallow copy of `merged`. There is also `import copy; copy.deepcopy(x)` for deep copies -- a copy that also copies the items inside the list recursively. (Not needed here, but useful to know about.) – unutbu May 19 '14 at 23:39
  • There's a list of ways to copy a list [here](http://stackoverflow.com/q/2612802/2088135) but it seems that the `[:]` slice is the fastest. It would be nice if there was something a little more readable but it works, so +1 – Tom Fenech May 19 '14 at 23:43
  • 1
    @TomFenech: Thanks for the question. I've changed my answer to use `list(merged)` since readability/understandability is more important than premature optimization. – unutbu May 19 '14 at 23:55