Unexpected result after using generator expression

Question

I am trying to filter some data I am working with to take out some artifacts such as negative numbers and errors in my measuring devices. I have been playing with the idea of using a generator to do this. I am using Python 2.7.2

testlist = [12,2,1,1,1,0,-3,-3,-1]  

gen = (i for i, x in enumerate(testlist) if x < 0 or x > 2.5)

for i in gen: testlist.pop(i)

print testlist

This returns:

[2, 1, 1, 1, 0, -3]

My question is why is the -3 value showing up in the updated "testlist"?

Just curious: why use a generator rather than a list comprehension? Is the real list you're using this on particularly large? — Chris Pfohl, Jun 06 '12 at 20:43
The actual list is several thousand values, so I don't think it is particularly large. As far as generator vs. list comprehension, the only reason I started with the less efficient method is due to inexperience. — Jason, Jun 06 '12 at 21:01
Some light background reading on the difference ;) Have fun: http://stackoverflow.com/q/47789/456188 — Chris Pfohl, Jun 07 '12 at 14:57

Mark Byers · Accepted Answer · 2012-06-06T20:49:22.897

When you remove items from your list, the indexes of the items after it change (they are all shifted down by one). As a result, the generator will skip over some items. Try adding some more print statements so that you can see what is going on:

for i in gen:
        print i
        print testlist
        testlist.pop(i)

Output:

0
[12, 2, 1, 1, 1, 0, -3, -3, -1]
5
[2, 1, 1, 1, 0, -3, -3, -1]
6
[2, 1, 1, 1, 0, -3, -1]

You would have needed to delete items at index 0, 5, 5, 5. The generator produces the indexes 0, 5, 6. That makes sense because enumerate returns 0, 1, 2, ... etc. It won't return the same index twice in a row.

It's also very inefficient to remove the elements one at a time. This requires moving data around multiple times, with a worst case performance of O(n²). You can instead use a list comprehension.

testlist = [x for x in testlist if 0 <= x <= 2.5]

Levon · Answer 2 · 2012-06-06T21:06:50.680

You are modifying the list you are working on, somewhat analogous to modifying the index value of, for instance, a for-loop from inside the loop, in some other languages. Consider this approach as an alternative:

testlist = [x for x in testlist if x >= 0 and x <= 2.5]

using list comprehension should work more directly, though it's not a generator expression, but could trivially changed to one:

testlist = (x for x in testlist if x >= 0 and x <= 2.5)

FrederikVds · Answer 3 · 2012-06-06T20:58:56.227

1

Let's consider a simpler input:

[-3, -4, -5]

First (0, -3) is taken from the enumerator. 0 is added to the generator. The for loop notices that a new element is available from the generator and removes -3:

[-4, -5]

Take a new element from the enumerator. The enumerator remembers taking the first element, so it will now take the second: -5. -5 is removed from the list in the same way. -4 remains.

By the way, an easier way to do what you're trying is the following:

testlist = filter(lambda x: x >= 0 and x <= 2.5, testlist)

edited Jun 06 '12 at 20:58

answered Jun 06 '12 at 20:41

FrederikVds

551
4
11

The pop that is supposed to remove the first -3 does remove the first one, but then the second one is skipped. This can be seen by changing one of the -3s to a -4. – Mark Byers Jun 06 '12 at 20:45
Nice use of filter - your lambda could simplify to `lambda x: 0 <= x <= 2.5` – PaulMcG Jun 06 '12 at 20:53
@Paul: Thanks, I didn't know Python accepts that style of comparison. – FrederikVds Jun 06 '12 at 21:00

score 1 · Answer 4 · answered Jun 06 '12 at 20:51

1

The better way to do this is to use a list comprehension to create a new filtered list:

testlist = [12,2,1,1,1,0,-3,-3,-1]  

testlist[:] = [x for x in testlist if 0 <= x <= 2.5]

giving:

[2, 1, 1, 1, 0]

answered Jun 06 '12 at 20:51

PaulMcG

62,419
16
94
130

+1 I like your idea of assigning to `testlist[:]` to avoid changing the reference. – Mark Byers Jun 06 '12 at 21:11
1

I got it from Alex Martelli, here http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python. – PaulMcG Jun 06 '12 at 23:25

Unexpected result after using generator expression

4 Answers4