7

I want to remove items from list 'a' where list 'b' contains items with words found in list 'a'

a = ['one two three', 'four five six', 'seven eight nine']
b = ['two', 'five six']

The result should be:

a = ['seven eight nine']

This because the words 'two' and 'five six' are found in items in list 'a'.

This is how I have tried to solve it:

for i in a:
    for x in b:
        if x in i:
            a.remove(i)

This returns:

print a
['four five six', 'seven eight nine']

Why does this not work, and how can I solve this problem?

Thanks.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user2758396
  • 75
  • 1
  • 1
  • 3

4 Answers4

8

Lists should not be modified while they're being iterated over. Doing so can have undesirable side effects, such as the loop skipping over items.

Generally in Python you should avoid loops that add and remove elements from lists one at a time. Usually those kinds of loops can be replaced with more idiomatic list comprehensions.

[sa for sa in a if not any(sb in sa for sb in b)]

For what it's worth, one way to fix your loops as written would be to iterate over a copy of the list so the loop isn't affected by the changes to the original.

for i in a[:]:
    for x in b:
        if x in i:
            a.remove(i)
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
6

Use a list comp and any instead:

a = ['one two three', 'four five six', 'seven eight nine']
b = ['two', 'five six']

print [el for el in a if not any(ignore in el for ignore in b)]
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • The original poster's problem arises from editing the list while iterating over it. This isn't the only way to avoid that, but it's a good one. – Peter DeGlopper Sep 13 '13 at 15:38
3

When you iterate over a list, you should never remove elements! That will mess up your iteration. The only way to cleanly edit a list while iterating over it in Python is to iterate backwards over the length of the list and delete elements as you go.

For example this works as an efficient in-place deletion loop:

a = ['one two three', 'four five six', 'seven eight nine']
b = ['two', 'five six']

for i in range(len(a) - 1, -1, -1):
    for x in b:
        if x in a[i]:
            del a[i]
print a # prints ['seven eight nine']

Furthermore, in your opening question, you said that you wanted to use comparison by words. your current loop does not do that. Consider that while you loop over the list b, you actually try to see if the two-word string is a substring of some item in a. You don't want to use the two-word string together. You want to split the string up into its separate word elements. For that, the split() function is key.

Notice that the following code does NOT delete the second element in the list:

a = ['one two three', 'four six five', 'seven eight nine']
b = ['two', 'five six']

for i in range(len(a) - 1, -1, -1):
    for x in b:
        if x in a[i]:
            del a[i]
print a # prints ['four six five', 'seven eight nine']

All I did was switch the order of 'six' and 'five' in a[1] and your loop stopped working. That's because it was looking for the string 'five six' in the string 'four six five' and obviously couldn't find it because there were no exact matches for that particular string.

Now if we try to split the string up into words, we can do actually do checks by iterating over the list of words.

a = ['one two three', 'four six five', 'seven eight nine']
b = ['two', 'five six']

for i in range(len(a) - 1, -1, -1):
    for x in b:
        for word in x.split():
            if word in a[i]:
                del a[i]
print a # correctly prints ['seven eight nine']
Shashank
  • 13,713
  • 5
  • 37
  • 63
0
for i in reversed(range(len(a))):
    for j in reversed(range(len(b))):
        if b[j] in a[i]:
            a.remove(a[i])

# output = ['seven eight nine']

You have to go through your list from the end, otherwise items gets reordered.

Paco
  • 4,520
  • 3
  • 29
  • 53