2

I am trying to filter some data I am working with to take out some artifacts such as negative numbers and errors in my measuring devices. I have been playing with the idea of using a generator to do this. I am using Python 2.7.2

testlist = [12,2,1,1,1,0,-3,-3,-1]  

gen = (i for i, x in enumerate(testlist) if x < 0 or x > 2.5)

for i in gen: testlist.pop(i)

print testlist

This returns:

[2, 1, 1, 1, 0, -3]

My question is why is the -3 value showing up in the updated "testlist"?

Chris Pfohl
  • 18,220
  • 9
  • 68
  • 111
Jason
  • 1,307
  • 1
  • 12
  • 11
  • Just curious: why use a generator rather than a list comprehension? Is the real list you're using this on particularly large? – Chris Pfohl Jun 06 '12 at 20:43
  • The actual list is several thousand values, so I don't think it is particularly large. As far as generator vs. list comprehension, the only reason I started with the less efficient method is due to inexperience. – Jason Jun 06 '12 at 21:01
  • Some light background reading on the difference ;) Have fun: http://stackoverflow.com/q/47789/456188 – Chris Pfohl Jun 07 '12 at 14:57

4 Answers4

7

When you remove items from your list, the indexes of the items after it change (they are all shifted down by one). As a result, the generator will skip over some items. Try adding some more print statements so that you can see what is going on:

for i in gen:
        print i
        print testlist
        testlist.pop(i)

Output:

0
[12, 2, 1, 1, 1, 0, -3, -3, -1]
5
[2, 1, 1, 1, 0, -3, -3, -1]
6
[2, 1, 1, 1, 0, -3, -1]

You would have needed to delete items at index 0, 5, 5, 5. The generator produces the indexes 0, 5, 6. That makes sense because enumerate returns 0, 1, 2, ... etc. It won't return the same index twice in a row.

It's also very inefficient to remove the elements one at a time. This requires moving data around multiple times, with a worst case performance of O(n2). You can instead use a list comprehension.

testlist = [x for x in testlist if 0 <= x <= 2.5]
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
1

You are modifying the list you are working on, somewhat analogous to modifying the index value of, for instance, a for-loop from inside the loop, in some other languages. Consider this approach as an alternative:

testlist = [x for x in testlist if x >= 0 and x <= 2.5]

using list comprehension should work more directly, though it's not a generator expression, but could trivially changed to one:

testlist = (x for x in testlist if x >= 0 and x <= 2.5)
Levon
  • 138,105
  • 33
  • 200
  • 191
1

Let's consider a simpler input:

[-3, -4, -5]

First (0, -3) is taken from the enumerator. 0 is added to the generator. The for loop notices that a new element is available from the generator and removes -3:

[-4, -5]

Take a new element from the enumerator. The enumerator remembers taking the first element, so it will now take the second: -5. -5 is removed from the list in the same way. -4 remains.

By the way, an easier way to do what you're trying is the following:

testlist = filter(lambda x: x >= 0 and x <= 2.5, testlist)
FrederikVds
  • 551
  • 4
  • 11
1

The better way to do this is to use a list comprehension to create a new filtered list:

testlist = [12,2,1,1,1,0,-3,-3,-1]  

testlist[:] = [x for x in testlist if 0 <= x <= 2.5]

giving:

[2, 1, 1, 1, 0]
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • +1 I like your idea of assigning to `testlist[:]` to avoid changing the reference. – Mark Byers Jun 06 '12 at 21:11
  • 1
    I got it from Alex Martelli, here http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python. – PaulMcG Jun 06 '12 at 23:25