7

(In Python 3.1) (Somewhat related to another question I asked, but this question is about iterators being exhausted.)

# trying to see the ratio of the max and min element in a container c
filtered = filter(lambda x : x is not None and x != 0, c)
ratio = max(filtered) / min(filtered)

It took me half hour to realize what the problem is (the iterator returned by filter is exhausted by the time it gets to the second function call). How do I rewrite it in the most Pythonic / canonical way?

Also, what can I do to avoid bugs of this sort, besides getting more experience? (Frankly, I don't like this language feature, since these types of bugs are easy to make and hard to catch.)

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • It isn't hard to catch when you know what you are doing. Try the following code using python2: `python -m timeit "r = xrange(1000000000)"` and `python -m timeit "r = range(1000000000)"`. As you can see iterators aren't unnecessary after all and in python3 `range()` behaves like `xrange()` indeed. – mg. Oct 15 '10 at 07:32

4 Answers4

10

The itertools.tee function can help here:

import itertools

f1, f2 = itertools.tee(filtered, 2)
ratio = max(f1) / min(f2)
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 2
    +1 - never heard of tee until now... Neat, though it's tricky to use (have to make sure I don't reuse filtered after that call, and also that it's not worse performance-wise than building the list). – max Oct 15 '10 at 07:02
  • 1
    tee will effectively make a complete copy of the filtered list internally, so it is pretty much equivalent to `filtered_list = list(filtered_iter)`. And your concern of trickiness is not an issue with this specific solution, but is inherent in using iterators - *whenever* you have an iterator, you have to make sure not to try consuming it twice. @THC4k's posted answer is more what you should be using - no additional copies of the iterated data, and no need to try "tee"ing iterators, etc. – PaulMcG Oct 15 '10 at 21:37
  • If only there was an easy way to interleave the `min` and `max` calls, then the tee would not have to store a buffer internally. – David Callanan Dec 16 '19 at 16:26
7

you can convert an iterator to a tuple simply by calling tuple(iterator)

however I'd rewrite that filter as a list comprehension, which would look something like this

# original
filtered = filter(lambda x : x is not None and x != 0, c)

# list comp
filtered = [x for x in c if x is not None and x != 0]
Neuron
  • 5,141
  • 5
  • 38
  • 59
lunixbochs
  • 21,757
  • 2
  • 39
  • 47
  • +1 Yup - just what I was thinking myself. Makes it trivial to check if `filtered` is empty too: `len(filtered) != 0` – max Oct 15 '10 at 07:04
  • You don't need to check the length of a list to see if it is empty - it will directly evaluate to False if it is empty, and True otherwise. – lunixbochs Oct 15 '10 at 07:08
  • See this for clarification: http://docs.python.org/library/stdtypes.html#truth-value-testing – lunixbochs Oct 15 '10 at 07:10
5

Actually your code raises an exception that would prevent this problem! So I guess the problem was that you masked the exception?

>>> min([])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence
>>> min(x for x in ())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: min() arg is an empty sequence

Anyways, you can also write a new function to give you the min and max at the same time:

def minmax( seq ):
    " returns the `(min, max)` of sequence `seq`"
    it = iter(seq)
    try:
        min = max = next(it)
    except StopIteration:
        raise ValueError('arg is an empty sequence')
    for item in it:
        if item < min:
            min = item
        elif item > max:
            max = item
    return min, max
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
  • 1
    Given an iterator, the 2nd best solution is to store the contents in a list, and then read the list as many times as you like. This solution, adjusting your algorithm to do all it needs in just a single scan of the iterator is better, if you can do it. – PaulMcG Oct 15 '10 at 21:37
  • well the bug was that I was catching ValueError thinking it was due to the iterator being emtpy to start with; not due to the iterator being exhausted by my own code! – max Oct 18 '10 at 08:28
3

The entity filtered is essentially an object with state. Of course, now it's obivous that running max or min on it will change that state. In order to stop tripping over that, I like to make it absolutely clear (to myself, really) that I am constructing something rather than just transforming something:

Adding an extra step can really help:

def filtered(container):
    return filter(lambda x : x is not None and x != 0, container)

ratio = max(filtered(c)) / min(filtered(c))

Whether you put filtered(...) inside some function (maybe it's not really needed for anything else) or define it as a module-level function is up to you, but in this case I'd suggest that if filtered (the iterator) was only needed in the function, leave it there until you need it elsewhere.

The other thing you can do is to construct a list from it, which will evaluate the iterator:

filtered_iter = filter(lambda x : x is not None and x != 0, container)
filtered = list(filtered_iter)

ratio = max(filtered) / min(filtered)

(Of course, you can just say filtered = list(filter(...)).)

detly
  • 29,332
  • 18
  • 93
  • 152
  • +1 : I'll use this technique, though I still dislike the whole "iterators are iterable" idea... feels like a bug waiting to happen. – max Oct 15 '10 at 07:03
  • @max - you can always go `list(...)` on an iterable to force it to be evaluated. I'll add something to my answer. – detly Oct 15 '10 at 07:15
  • @max - more generally though... what else would an iterator do, than iterate? Storing the whole thing in memory might be impractical or impossible (eg. indefinite prime generator), and rewinding is also only possible in all cases if you store iterator results somehow. Read the docs for the function you're using, and convert to a list if you need to. Or maybe I've missed your point? – detly Oct 15 '10 at 07:25