2

I am still learning the basics of python, and I have just spent a while reading about how to remove an item from a list in python from within a for loop. Everything I've read suggests complex ways of doing this, and they say that you cannot remove an item from a list while you're iterating over it. However... this seems to work:

class Object():
    def __init__(self):
        self.y = 0

object_list = [Object(), Object(), Object()]

for thing in object_list:
    thing.y += 1
    if thing.y > 10:
        object_list.remove(thing)

Why is this working when others say it isn't and write complicated workarounds? Is it because you aren't allowed to do it in Python 2 but can in Python 3?

And is this the right way to do this? Will it work as I want it or will it be prone to bugs? Would it be advisable to iterate over the list in reverse order if I plan to remove items?

Sorry if this has been answered before, but it's hard to know which resources refer to what as they all just say "python" in the tag (at least, the ones I've been reading, maybe that's because all the ones I have read are python 2?)

Thanks!

EDIT:

Sorry, there were a couple of copy and paste errors... I've fixed them...

EDIT:

I've been watching another one of Raymond Hettinger's videos... He mentions a way of removing items from a dictionary while iterating over it by using dict.keys(). Something like:

d = {'text': 'moreText', 'other': 'otherText', 'blah': 'moreBlah'}

for k in d.keys():
    if k.startswith('o'):
        del d[k]

Apparently using the keys makes it safe to remove the item while iterating. Is there an equivalent for lists? If there was I could iterate backwards over the list and remove items safely

smac89
  • 39,374
  • 15
  • 132
  • 179
Iron Attorney
  • 1,003
  • 1
  • 9
  • 22
  • 2
    **It does not work**. Your objects are simply indifferentiable; so you can't tell one from another after removal. Try adding a `self` to the `__init__` and initializing with different `y` values. And obviously the `if` branch is never taken – Moses Koledoye Nov 11 '16 at 01:10
  • Are you sure this is working? Looks to me like you'd get a `NameError` at `for thing in thing_list:`. – TigerhawkT3 Nov 11 '16 at 01:15
  • Well, the objects will have different y values eventually, but I'll want to make sure that if two happen to have the same y value for any reason, then they will both be removed. I'm guessing when you remove one from a list it shifts the rest up a place and so the iterator accidentally jumps one object while still being current on the index? – Iron Attorney Nov 11 '16 at 01:16
  • It's working, what's the name error? I have just seen, if I add 9 items to the list, on the first go through the for loop, 2 items are missed, when all should be removed. Then there is 1 left the next run through, and then finally none. I'm surprised it doesn't complain that there are items missing part way through? – Iron Attorney Nov 11 '16 at 01:18
  • Ah, in fairness, I do have __init__(self). I just wrote it out wrong here, apologies, I'll edit it – Iron Attorney Nov 11 '16 at 01:22
  • @TessellatingHeckler There are some interesting answers, but the original question was asked in 2009... was that python 2 or 3? And are the answers given still the best ways? – Iron Attorney Nov 11 '16 at 01:37
  • `dict.keys` won't work as they reflect the current state of the dictionary. So if you remove a key it also changes the keys you are iterating over. Will need to do `for k in list(d): ...` to iterate over a list of the keys which doesn't change during iteration. – Steven Summers Nov 11 '16 at 03:20

2 Answers2

8

Here are some examples

def example1(lst):
    for item in lst:
        if item < 4:
            lst.remove(item) 
    return lst

def example2(lst):
    for item in lst[:]:
        if item < 4:
            lst.remove(item)       
    return lst

def example3(lst):
    i = 0
    while i < len(lst):
        if lst[i] < 4:
            lst.pop(i)
        else:
            i += 1
    return lst

def example4(lst):
    return [item for item in lst if not item < 4]

def example5(lst):
    for item in reversed(lst):
        if item < 4:
            lst.remove(item)
    return lst

def example6(lst):
    for i, item in reversed(list(enumerate(lst))):
        if item < 4:
            lst.pop(i)
    return lst

def example7(lst):
    size = len(lst) - 1
    for i, item in enumerate(reversed(lst)):
        if item < 4:
            lst.pop(size - i)
    return lst

def example8(lst):
    return list(filter(lambda item: not item < 4, lst))

import itertools
def example9(lst):
    return list(itertools.filterfalse(lambda item: item < 4, lst))

# Output
>>> lst = [1, 1, 2, 3, 2, 3, 4, 5, 6, 6]
>>> example1(lst[:])
[1, 3, 3, 4, 5, 6, 6]
>>> example2(lst[:])
[4, 5, 6, 6]
>>> example3(lst[:])
[4, 5, 6, 6]
>>> example4(lst[:])
[4, 5, 6, 6]
>>> example5(lst[:])
[4, 5, 6, 6]
>>> example6(lst[:])
[4, 5, 6, 6]
>>> example7(lst[:])
[4, 5, 6, 6]
>>> example8(lst[:])
[4, 5, 6, 6]
>>> example9(lst[:])
[4, 5, 6, 6]

Example 1 This example involves iterating through the list and removing values from it. The issue with this is that you are modifying the list as you go through it so your list changes during iteration and so some elements get skipped over.

Example 2 Here we are iterating over a shallow copy of the list instead of the list itself. The issue with this is if you have a large list it could be expensive to do so.

Example 3 The following is an example using pop instead of remove, the issue with remove is that it removes the first instance of the value it finds from the list. This will typically be of no issue unless you have objects which are equal. (See example 10)

Example 4 Instead of modifying the list here instead we create a new list using list comprehension allowing only specified values.

Example 5 This is an example of iterating through the list in reverse, the difference is that we use the built-in reversed function to apply a for-loop to in stead of a while loop with a counter.

Example 6 Similar example using pop instead.

Example 7 Better example using pop as we don't have to cast back to a list to use the reversed function.

Example 8 Example using the built-in filter method to remove the specified values.

Example 9 Similar example using the filerfalse method from itertools

class Example(object):
    ID = 0
    def __init__(self, x):
        self._x = x
        self._id = str(Example.ID)
        Example.ID += 1

    def __eq__(self, other):
        return self._x == other._x

    def __repr__(self):
        return 'Example({})'.format(self._id)

def example10():
    lst = [Example(5), Example(5)]
    print(lst)
    lst.remove(lst[1])
    return lst

#Output
>>> example10()
[Example(0), Example(1)]
[Example(1)]

Example 10 Here we create two Example objects with the same values and by the equality method they are equal. The ID variable is there to help us differentiate between the two. Now we have specified that we want to remove the 2nd object from the list, however because both are equal the first item is actually removed instead.

Timings These are pretty rough times and can vary slightly depending on your device. Although these identify which one is faster, this was tested with a list of 10,000 items so if you don't have anything close to that then any choice is fine really.

import timeit
import random

# Code from above is here

def test(func_name):
    global test_lst
    test_lst = lst[:]
    return timeit.timeit("{}(test_lst)".format(func_name),
                         setup="from __main__ import {}, test_lst".format(func_name), number = 1)

if __name__ == '__main__':
    NUM_TRIALS = 1000
    lst = list(range(10000))
    random.shuffle(lst) # Don't have to but makes it a bit interesting
    test_list = lst[:]

    for func in ('example2', 'example3', 'example4', 'example5',
                 'example6', 'example7', 'example8', 'example9'):
        trials = []
        for _ in range(NUM_TRIALS):
            trials.append(test(func))
        print(func, sum(trials) / len(trials) * 10000)

#Output
example2 8.487979147454494
example3 20.407155912623292
example4 5.4595031069025035
example5 7.945100572479213
example6 14.43537688078149
example7 9.088818018676008
example8 14.898256300967116
example9 13.865010859443247
Steven Summers
  • 5,079
  • 2
  • 20
  • 31
  • Thanks! That's a helpful set of examples. Will example 4 also be slow if I am using a large list and want only a few items removed? – Iron Attorney Nov 11 '16 at 02:32
  • 1
    Its about the same as example 3 and better than example 2. – Steven Summers Nov 11 '16 at 02:33
  • Alright, cheeers! What about iterating backwards? I know it is advised not to modify objects while iterating over them, but if all I'm doing is removing them, will it matter if I iterate in reverse order? My tests show that it does indeed remove all the objects that it should from my list – Iron Attorney Nov 11 '16 at 02:34
  • The problem is that you still need to iterate through the list anyway to check each value so your best solution will be one that requires a single loop. As for iterating backwards it works without issue but it still modifies the list during iteration so I can't vouch for its reliability. There is no real need to iterate through it backwards anyway though. Example being added. – Steven Summers Nov 11 '16 at 02:44
  • 1
    Okay, I have added the timings for each example provided and it seems I was way off for example 3. Probably due to the indexing to check and removing the item. 6, 7, 8 was expected because we have to cast it back to a list but I figured I would add them in anyway. – Steven Summers Nov 11 '16 at 03:16
  • Thanks very much! This is a fantastic and detailed answer. Nice one! – Iron Attorney Nov 15 '16 at 15:34
1

It will work. However it's never a good idea to modify an object while you're iterating over it. You'll likely to get unexpected behaviour.

If I did this:

my_list = [1, 2, 3, 4]
for x in my_list:
    if x+1 in my_list:
        my_list.remove(x+1)

I'd expect my_list = [1] at the end. 1 remove 2, 2 removes 3, and 3 removes 4. If I check though I find my_list=[1,3]. This is because 2 was removed from the list in the first loop, so the second loop used 3 to remove 4, and 3 is still in the list.

Batman
  • 8,571
  • 7
  • 41
  • 80
  • So what is the correct work around? And is it pretty? I've been watching Raymond Hettinger's videos on beautiful code, and it seems that the dev team like to make nice looking and easy to use shortcuts for these sorts of issues – Iron Attorney Nov 11 '16 at 01:20
  • Depends on exactly what you want to achieve. There's nothing wrong, for example, from removing items from a list in a `for` loop, if that's not the object that you're iterating over. If it is, create an empty list, and then `append` each object that you want to remove to that list. You can then iterate over the second list, and remove each item from the first. – Batman Nov 11 '16 at 01:27
  • Would I be safe if I cycled through the list backwards, that way I'm not buggering up the order of the items before I get to them – Iron Attorney Nov 11 '16 at 01:30
  • A list/generator expression with an 'if' filter might do it. If the filter criteria is complicated you can put it in a function – roarsneer Nov 11 '16 at 01:32
  • 1
    Cycling through the list backwards doesn't really change anything. The fundamental issue is still the same; you're modifying an object while you iterate over it. For what it's worth, I'm a big fan of beautiful code. However it's more important than your code works, and is robust. – Batman Nov 11 '16 at 01:35
  • object_list = [ thing for thing in object_list if not thing.y > 10 ] Is that a better solution? I just found that in another answer – Iron Attorney Nov 11 '16 at 01:53
  • It seems to remove all items as expected at least – Iron Attorney Nov 11 '16 at 01:54
  • Only... If I'm copying all the items I don't want to keep, when in practice there are only likely to be one or two items out of many being removed at once... then I'm going to do a whole lot of work copying when I might not need to aren't I? – Iron Attorney Nov 11 '16 at 01:58