14

In Python the interface of an iterable is a subset of the iterator interface. This has the advantage that in many cases they can be treated in the same way. However, there is an important semantic difference between the two, since for an iterable __iter__ returns a new iterator object and not just self. How can I test that an iterable is really an iterable and not an iterator? Conceptually I understand iterables to be collections, while an iterator only manages the iteration (i.e. keeps track of the position) but is not a collection itself.

The difference is for example important when one wants to loop multiple times. If an iterator is given then the second loop will not work since the iterator was already used up and directly raises StopIteration.

It is tempting to test for a next method, but this seems dangerous and somehow wrong. Should I just check that the second loop was empty?

Is there any way to do such a test in a more pythonic way? I know that this sound like a classic case of LBYL against EAFP, so maybe I should just give up? Or am I missing something?

Edit: S.Lott says in his answer below that this is primarily a problem of wanting to do multiple passes over the iterator, and that one should not do this in the first place. However, in my case the data is very large and depending on the situation has to be passed over multiple times for data processing (there is absolutely no way around this).

The iterable is also provided by the user, and for situations where a single pass is enough it will work with an iterator (e.g. created by a generator for simplicity). But it would be nice to safeguard against the case were a user provides only an iterator when multiple passes are needed.

Edit 2: Actually this is a very nice Example for Abstract Base Classes. The __iter__ methods in an iterator and an iterable have the same name but are sematically different! So hasattr is useless, but isinstance provides a clean solution.

nikow
  • 21,190
  • 7
  • 49
  • 70

4 Answers4

12
'iterator' if obj is iter(obj) else 'iterable'
vartec
  • 131,205
  • 36
  • 218
  • 244
  • Wow, this seems to be the answer that I have been looking for, thanks! I will wait a little before accepting it, in case somebody can point out a problem with this. – nikow Apr 02 '09 at 10:24
  • Well, the problem is one "wasted" call to obj.__iter__(), but I don't see other reliable way to do it. – vartec Apr 02 '09 at 10:28
  • 1
    Although I don't know a counter-example, this is not *guaranteed* to work. – tzot Apr 02 '09 at 23:26
  • @ΤΖΩΤΖΙΟΥ: well you could imagine objects, that doesn't have .next(), but has __iter__(self) = lambda x: x – vartec Apr 03 '09 at 07:37
  • 1
    @ΤΖΩΤΖΙΟΥ: but then again, what would be point of such object? – vartec Apr 03 '09 at 07:39
  • 1
    I never said anything about objects not having .next. Your premise is `iter(obj) is obj`, which AFAIK is true, but it's not guaranteed. – tzot Apr 03 '09 at 19:45
  • 2
    @tzot It's guaranteed according to [PEP 234](http://www.python.org/dev/peps/pep-0234/) - *A class that wants to be an iterator should implement two methods: a `next()` method that behaves as described above, and an `__iter__()` method that returns self.* – Piotr Dobrogost May 01 '12 at 18:48
  • 1
    The answer makes an assumption that sets of iterators and iterables are disjoint which is not the case. – Piotr Dobrogost May 01 '12 at 19:48
  • -1 *A file object is its own iterator, for example `iter(f)` returns `f` (unless `f` is closed).* - [5.9. File Objects](http://docs.python.org/2/library/stdtypes.html#file-objects) – Piotr Dobrogost Dec 03 '12 at 21:47
  • @PiotrDobrogost: YPB? *"if it quacks like a duck"*... file object implements iterator interface. – vartec Dec 05 '12 at 12:49
  • My point being if it quacks like an iterator and it quacks like iterable it's both. Like I said in my earlier comment your test makes an assumption that some object can't be iterator and iterable at the same time which is false and file object is an example of such an object. – Piotr Dobrogost Dec 05 '12 at 19:13
3

However, there is an important semantic difference between the two...

Not really semantic or important. They're both iterable -- they both work with a for statement.

The difference is for example important when one wants to loop multiple times.

When does this ever come up? You'll have to be more specific. In the rare cases when you need to make two passes through an iterable collection, there are often better algorithms.

For example, let's say you're processing a list. You can iterate through a list all you want. Why did you get tangled up with an iterator instead of the iterable? Okay that didn't work.

Okay, here's one. You're reading a file in two passes, and you need to know how to reset the iterable. In this case, it's a file, and seek is required; or a close and a reopen. That feels icky. You can readlines to get a list which allows two passes with no complexity. So that's not necessary.

Wait, what if we have a file so big we can't read it all into memory? And, for obscure reasons, we can't seek, either. What then?

Now, we're down to the nitty-gritty of two passes. On the first pass, we accumulated something. An index or a summary or something. An index has all the file's data. A summary, often, is a restructuring of the data. With a small change from "summary" to "restructure", we've preserved the file's data in the new structure. In both cases, we don't need the file -- we can use the index or the summary.

All "two-pass" algorithms can be changed to one pass of the original iterator or iterable and a second pass of a different data structure.

This is neither LYBL or EAFP. This is algorithm design. You don't need to reset an iterator -- YAGNI.


Edit

Here's an example of an iterator/iterable issue. It's simply a poorly-designed algorithm.

it = iter(xrange(3))
for i in it: print i,; #prints 1,2,3 
for i in it: print i,; #prints nothing

This is trivially fixed.

it = range(3)
for i in it: print i
for i in it: print i

The "multiple times in parallel" is trivially fixed. Write an API that requires an iterable. And when someone refuses to read the API documentation or refuses to follow it after having read it, their stuff breaks. As it should.

The "nice to safeguard against the case were a user provides only an iterator when multiple passes are needed" are both examples of insane people writing code that breaks our simple API.

If someone is insane enough to read most (but not all of the API doc) and provide an iterator when an iterable was required, you need to find this person and teach them (1) how to read all the API documentation and (2) follow the API documentation.

The "safeguard" issue isn't very realistic. These crazy programmers are remarkably rare. And in the few cases when it does arise, you know who they are and can help them.


Edit 2

The "we have to read the same structure multiple times" algorithms are a fundamental problem.

Do not do this.

for element in someBigIterable:
    function1( element )
for element in someBigIterable:
    function2( element )
...

Do this, instead.

for element in someBigIterable:
    function1( element )
    function2( element )
    ...

Or, consider something like this.

for element in someBigIterable:
    for f in ( function1, function2, function3, ... ):
        f( element )

In most cases, this kind of "pivot" of your algorithms results in a program that might be easier to optimize and might be a net improvement in performance.

Lesmana
  • 25,663
  • 9
  • 82
  • 87
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • What about multiple times in parallel? E.g. several threads iterating over the same collection? Or even one thread, such as an easily-imagined naive implementation of "does this collection have the same element twice?". – Edmund Apr 02 '09 at 10:16
  • Thanks, I added an explanation to the question. You have a valid point, but in my case I belief this does not work. – nikow Apr 02 '09 at 10:22
  • "remarkably rare". I'd disagree, programmers that can't tell iterable from iterator are not by any means rare. "you know who they are and can help them." That's usually not your job, and in corporation "helping them" would not be very well perceived, especially if it's another department. – vartec Apr 02 '09 at 11:10
  • @vartec: it's your application/library/framework, you need to support it. Helping the crazy programmers who refuse to read the API and can't figure out why it broke when they didn't follow the rules is support as I understand it. It *is* well perceived in my experience. – S.Lott Apr 02 '09 at 11:15
  • @S.Lott: In a perfect world you're right. In corporative politics saying, that the other department's code isn't correct results in conflict. If the other dept. has more political influence, your help will be perceived as "trying to cover incompetence". And it doesn't mater if your right or wrong. – vartec Apr 02 '09 at 11:22
  • +1 for your edit. I was teached, this is programming by contract. If one party doesn't comply, the other party doesn't need to comply, too. –  Apr 02 '09 at 11:29
  • @S.Lott: In this context, may I call upon your attention to this question: http://stackoverflow.com/questions/701088/py3k-memory-conservation-by-returning-iterators-rather-than-lists Thanks! – lprsd Apr 02 '09 at 11:53
  • @becomingGuru: preoccupation with memory management can become silly. My point is that many "2-pass" algorithms do significant data reduction on the first pass; the second pass is not necessary because it's working on a smaller data structure. – S.Lott Apr 02 '09 at 12:01
  • @vartec: if your organization's corporate politics are so dysfunctional that help == conflict, you should find a better organization to work for. Writing useless code to work around organizational problems is an epic fail waiting to happen. – S.Lott Apr 02 '09 at 12:07
  • @heikogerlach: more importantly, if one party won't comply, the other party can't coerce compliance. If they won't comply, they wrote the bug; you can't fix their refusal to comply. – S.Lott Apr 02 '09 at 12:43
  • @S.Lott: I already did. But as far as I know it's pretty much the same in most big corporations. Big bureaucracies are always inefficient. – vartec Apr 02 '09 at 21:07
  • @vartec: Over the last 30 years, I've never worked at a place where helping someone become conflict or was perceived badly. An API contract has never been a problem in 100's of locations. Convoluted code to help the crazy programmers who can't follow the API is -- simply -- bad. – S.Lott Apr 02 '09 at 21:15
  • +1 for addressing the question of 2 passes over a large data at such length. I agree - if it seems like you need to iterate over the exact same, unchanged, entire data twice, there's a design issue that needs to be addressed. – Jarret Hardie Apr 02 '09 at 22:44
  • 4
    @S.Lott That is quite an obnoxious attitude to assume that if you cannot conceive of a case where multiple passes are required then the need for it is not worthy of consideration, or it cannot be the basis of a worthwhile question. I am sure you have heard of Gauss-Seidel or other iterative algorithms for solving linear equations. Can you do those in a single pass. System of linear of linear equations isn't rocket science either. All that was required here was a direct answer to the question not a smug lecture on bad algorithms that the OP did not even ask about. – srean Jun 24 '11 at 04:55
  • @srean: That is quite an obnoxious attitude to assume that everyone else knows of some algorithms which **require** multiple passes through the data. Sadly, few people have the same depth of insight. Since I did not know, I could not provide meaningful comments. You could provide helpful information (like a specific example, viz. Gauss-Seidel). Or you could complain because some of us don't have as rich a background in mathematics. – S.Lott Jun 24 '11 at 10:02
  • 2
    @S.Lott @nikow The notion of Gauss-Seidel is peripheral as far as the OP is concerned and there is of course wikipedia. The original question was not about existence of any multipass algorithm till your ego led you to believe that it was and there are no such algorithm worth considering. It would have been more helpful if you could answer OP's direct question, if not, then losing the attitude of a smug know it all and refraining from lecturing the OP (that too incorrectly) about how meaningless his question was, would have helped too. – srean Jun 24 '11 at 17:45
  • @srean: Ignorance of an example is not "ego". It's a lack of a good example. "attitude of a smug know it all"? "lecturing the OP"? Wow. You are certainly angry about something. I'm sorry my answer upset you so much. I thought they are all good examples of how to design algorithms so that the one-time-through part of Python was a non-issue. Since you have a counter-example, you're free to write your own answer. That seems somehow more productive than name-calling. – S.Lott Jun 24 '11 at 19:31
  • 2
    @S.Lott "you're free to write your own answer" I prefer not to pollute SO with irrelevant answers. Ignorance is certainly not ego, but asserting or implying that OP is possibly at fault for just conceiving of (not even using) multiple passes is. It is also presumptive, and in this case also incorrect. As you say, your examples might well be good, but has nothing to do with OP's question, which was about the difference between an iterable and iterator. Lecture people without solicitation if you must, but losing the attitude will go a long way, regardless of correctness. – srean Jun 24 '11 at 19:59
  • @srean: The question indicates a design problem. Solving the design problem by prevention and avoidance seems far simpler than hacking around trying to add complexity where none is needed. Again. I can only apologize so much for causing such unwarranted anger. The question indicated a design problem that was better solved by avoidance. Please post your own answer. I cannot see changing this because I still believe it's the correct approach except in the case you identified. – S.Lott Jun 24 '11 at 20:01
  • +1 — *" Write an API that requires an iterable. And when someone refuses to read the API documentation or refuses to follow it after having read it, their stuff breaks. As it should."* — this made me realise where I was thinking wrong. Iterable vs. iterator is part of the API and the design of the function in question (whether it's what you take or what you return). – detly Nov 30 '11 at 02:00
  • As much as this is valuable answer, do you know the answer to the question asked? – Piotr Dobrogost Dec 04 '12 at 22:04
2
import itertools

def process(iterable):
    work_iter, backup_iter= itertools.tee(iterable)

    for item in work_iter:
        # bla bla
        if need_to_startover():
            for another_item in backup_iter:

That damn time machine that Raymond borrowed from Guido…

tzot
  • 92,761
  • 29
  • 141
  • 204
0

Because of Python's duck typing,

Any object is iterable if it defines the next() and __iter__() method returns itself.

If the object itself doesnt have the next() method, the __iter__() can return any object, that has a next() method

You could refer this question to see Iterability in Python

Community
  • 1
  • 1
lprsd
  • 84,407
  • 47
  • 135
  • 168
  • Try this: class A(object): def __iter__(self): return iter([1,2,3]) def next(self): yield 7 – vartec Apr 02 '09 at 12:04
  • Actually this is a problem of duck typing: it can hide a semantic / conceptual difference. It allows us to write for i in range(3) instead of for i in iter(range(3)), but can cause subtle problems. – nikow Apr 02 '09 at 12:05
  • @vartec What was the code in your comment supposed to demonstrate? – Piotr Dobrogost May 01 '12 at 20:03
  • @PiotrDobrogost: I don't remember, that was 3 years ago ;-) – vartec May 01 '12 at 21:01