2

In the following code, I have run into a RecursionError: maximum recursion depth exceeded.

def unpack(given):
    for i in given:
        if hasattr(i, '__iter__'):
            yield from unpack(i)
        else:
            yield i

some_list = ['a', ['b', 'c'], 'd']

unpacked = list(unpack(some_list))

This works fine if I use some_list = [1, [2, [3]]], but not when I try it with strings.

I suspect my lack of knowledge in python. Any guidance appreciated.

user7865286
  • 344
  • 2
  • 11
  • 2
    Strings are themselves iterable, and the things they iterate over are strings. `for x in 'a'` yields `'a'` itself. – BallpointBen Apr 04 '18 at 02:47
  • 1
    Try `some_list = []; some_list.append(some_list); unpacked = list(unpack(some_list))` to see that this can happen with anything with depth>1000. So the remaining question is why every string has depth>1000, which wim's answer (and BallpointBen's comment) explains. – abarnert Apr 04 '18 at 03:01
  • @abarnert For your case, is it that `__iter__` for `list` returns itself, and naturally it's unending recursion? – user7865286 Apr 04 '18 at 03:10
  • @user7865286 Yes—or, maybe more simply, that the list contains itself: `some_list[0] is some_list`. I thought this would be less surprising than the fact that if `s = 'a'`, `s[0] is s`, so it would help illuminate the problem, but now that I think about it, how many people actually know about recursive lists in Python? The only really obvious example would be a class that explicitly iterates itself, which is too big and distracting to be worth commenting about; better to just go straight to `s[0] is s` for strings as BallpointBen did. – abarnert Apr 04 '18 at 03:22

1 Answers1

7

Strings are infinitely iterable. Even a one-character string is iterable.

Therefore, you will always get a stack overflow unless you add special handling for strings:

def flatten(x):
    try:
        it = iter(x)
    except TypeError:
        yield x
        return
    if isinstance(x, (str, bytes)):
        yield x
        return
    for elem in it:
        yield from flatten(elem)

Note: using hasattr(i, '__iter__') is not sufficient to check if i is iterable, because there are other ways to satisfy the iterator protocol. The only reliable way to determine whether an object is iterable is to call iter(obj).

wim
  • 338,267
  • 99
  • 616
  • 750
  • It could be just a comment. – Sraw Apr 04 '18 at 02:47
  • 2
    "using `hasattr(i, '__iter__')` is not sufficient to check if `i` is iterable" - true, but `isinstance(x, Iterable)` fails for many of the same cases. The most reliable check is to just call `iter`. – user2357112 Apr 04 '18 at 02:53
  • @user2357112 is right. In particular, the "old-style sequence iteration protocol" (which is actually a pair of related incomplete protocols, but never mind that) isn't captured by either test. Just define a class with a `__len__` that returns an `int` (and make sure it's in `range(0, 1<<32)`), and a `__getitem__` that doesn't raise `IndexError` for any value in `range(0, __len__())` but does for `__len__()`, and you've got an iterable but not an `Iterable`. – abarnert Apr 04 '18 at 03:00
  • What do you think is a more Pythonic way to implement `flatten`? I almost always prefer EAFP, but somehow I always come back to LBYL for this one. For example - which exception(s) to catch, is just `TypeError` enough? It's hard to say. – wim Apr 04 '18 at 03:02
  • `isinstance(x, Iterable)` is definitely an improvement over `hasattr(x, '__iter__')`, at least. It'll pick up "anti-registration" through `__iter__ = None`, and it'll detect `isinstance("asdf", Iterable)` in Python 2 (though only due to a `register` call). – user2357112 Apr 04 '18 at 03:04
  • I've tried something like [this](https://pastebin.com/5nhVVy98) but somehow it feels *less* Pythonic. You already need (?) the type-check to handle strings, so why not just use the `Iterable` abc while you're at it. Absolutely don't care about Python 2 these days unless there's a good reason. – wim Apr 04 '18 at 03:09
  • The only way I can think of to avoid the type check on strings is to check that the object doesn't iterate itself. Which has the advantage of catching, say, a third-party `bytes`-like object that iterates bytes (I've seen people use a trivial one for porting Python 2 network protocol code to 3, and Nick Coghlan always threatens to write one that carries its encoding with it but never does). But it has the disadvantage of also catching cases where you self-contain explicitly—you might actually want an infinite iterator for those cases (although you probably wouldn't want the `RecursionError`). – abarnert Apr 04 '18 at 03:27
  • Anyway, I don't think a perfect `flatten` that DWIM in every possible case is possible; otherwise it would at least be in the `itertools` recipes, if not somewhere in the stdlib. I think your version (original or updated), maybe with a comment or docstring listing which choices it made for the two issues, is fine for most real-life projects. – abarnert Apr 04 '18 at 03:30