abnormal string list behaviour in python

Question

    print difflist
    for line in difflist:
        if ((line.startswith('<'))or (line.startswith('>')) or (line.startswith('---'))):
            difflist.remove(line)
    print difflist

Here, initially,

difflist = ['1a2', '> ', '3c4,5', '< staring', '---', '> starring', '> ', '5c7', '< at ', '---', '> add ', '']

And what i expect of the code is to print

['1a2', '3c4,5', '5c7', '']

But what i get instead is

difflist= ['1a2', '3c4,5', '---', '> ', '5c7', '---', '']

This is a common question. You're changing the list while iterating over it. — mgilson, Apr 17 '13 at 23:21
Try to use a loop with a counter and set it back 1 each time you have a match in your list. Other options are to push the contents into a new empty list (those that "do" match). — , Apr 17 '13 at 23:24
Also: http://stackoverflow.com/questions/6500888/removing-from-a-list-while-iterating-over-it, http://stackoverflow.com/questions/6022764/python-removing-list-element-while-iterating-over-list — poke, Apr 17 '13 at 23:28

score 3 · Answer 1 · answered Apr 17 '13 at 23:24

when iterating over a list, python keeps an integer index of the array element that it's pointing to. however, when you remove the current element, then all of the later elements shift to a lower index. Then the position index gets incremented before you get to "see" the element that shifted to take the place of the element you removed.

Ultimately, this is better done with a list comprehension:

difflist = [ line for line in difflist if not line.startswith(('<','>','---'))]

If you really need to do the operation in place just use slice assignment on the left hand side:

difflist[:] = [ line for line in difflist if not line.startswith(('<','>','---'))]

Tom · Answer 2 · 2013-04-17T23:37:49.747

2

I think you might be invalidating your iterators; in other words, you shouldn't try to remove an item on a list that you're looping through.

You might want to make a new list, that only contains items you care about.

For example:

newdifflist = []
for line in difflist:
    if not ((line.startswith('<'))or (line.startswith('>')) or (line.startswith('---'))):
        newdifflist.append(line)

More pythonic, using a list comprehension and multiple arguments to startswith():

newdifflist = [line for line in difflist if not line.startswith(('<', '>', '---')) ]

edited Apr 17 '13 at 23:37

answered Apr 17 '13 at 23:21

Tom

2,369
13
21

This doesn't invalidate anything actually. – kindall Apr 17 '13 at 23:22

score 1 · Answer 3 · answered Apr 17 '13 at 23:24

1

result = []
for line in difflist:
    if not line.startswith(('<', '>', '---')):
        result += [line]

Or using list comprehensions:

[line for line in difflist if not line.startswith(('<', '>', '---'))]

answered Apr 17 '13 at 23:24

aldeb

6,588
5
25
48

Nice correction, comprehensions are much more Pythonic. – buruzaemon Apr 17 '13 at 23:26

score 1 · Answer 4 · answered Apr 17 '13 at 23:25

1

Do this instead:

>>> difflist = [i for i in difflist if not i.startswith(('<','>','---'))]
>>> difflist
['1a2', '3c4,5', '5c7', '']

Doing .remove() changes the order and thus (kinda) messes up the for-loop. Check out mgilson's answer for more info

answered Apr 17 '13 at 23:25

TerryA

58,805
11
114
143

score 0 · Answer 5 · answered Apr 17 '13 at 23:25

instead of trying to remove the item from the list, have it do nothing and create another list with what you want.

array = []
for line in difflist:
    if ((line.startswith('<'))or (line.startswith('>')) or (line.startswith('---'))):
        pass
    else:
        array.append(line)

now array will be the array you are looking for!

kindall · Answer 6 · 2013-04-19T16:06:12.803

There's nothing "abnormal" about what's going on. In fact it's quite normal. Here's what's happening:

The loop is looking at item i.
Item i meets your test, so you delete it. What was item i+1 is now item i.
Your loop begins again and the iterator advances to point to item i+1, but the new item i is never tested.

There are a few possible solutions:

Make a new list containing only the items you want, instead of removing items from your existing list. This has better algorithmic complexity and is also more efficient in Python, but it can use more memory.
Iterate the list in reverse order. This way, you are only ever shifting items you've already looked at.
Iterate over a copy of the list.
Use an inner while loop instead of an if.

For this problem, solution #4 might look like this:

for i, line in enumerate(difflist):
    while line.startswith(('<', '>', '---')):
        difflist.pop(i)
        line = difflist[i]

This way, you keep looking at the same index until it fails your test, and only then allow the iterator to move on to the next one.

(I took the liberty of removing the plethora of unnecessary parentheses in your condition, as well as changing your remove to pop -- remove needs to search from the beginning of the list each time, making your loop a Schlemiel the Painter's algorithm).

Another thing you could look into is using a deque(from the collections module); this is a linked list (a regular Python list is actually a resizable array) and it will be somewhat faster to delete elements from it.

isn't that still a Schlemiel the Painter's algorithm though? Each remove is O(n), but so is a `pop` (on average). — mgilson, Apr 17 '13 at 23:49
With `pop` indexing is O(1) and the actual deletion is O(1/2n), whereas with `remove` it's O(1/2n) to both find the element and remove it, for a total of O(n). — kindall, Apr 18 '13 at 02:47
O((1/2)*n) is the same thing as O(n) in Big-O notation ... So you save a factor of 2, but your algorithm still *scales* the same. Ultimately, the best scaling is to create a new list and reassign which is a flat O(N) no matter how many items you remove (versus the others which are O(N*M) where M is the number of items to remove). — mgilson, Apr 18 '13 at 12:24
OK, I learned something new. :-) I did say that creating a new list would be the fastest approach. If you have to remove things from the existing list, it makes sense to do it the fastest way possible though (even if it's "only" twice as fast algorithmically). — kindall, Apr 18 '13 at 13:36

abnormal string list behaviour in python

6 Answers6