3

I've written the following code to do a pre-order traversal of a Python dict which may contain other dicts:

def preorder_traversal(obj):
    yield obj
    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for o in preorder_traversal(v):
                yield o

Here are a few examples of its behavior:

> list(preorder_traversal({}))
[{}]
> list(preorder_traversal({'a': 1, 'b': 2}))
[{'a': 1, 'b': 2}, 1, 2]
> list(preorder_traversal({'a': {'c': 1}, 'b': 2}))
[{'a': {'c': 1}, 'b': 2}, {'c': 1}, 1, 2]

It's possible that the tree could get very large, so I'd like to add a mechanism whereby the consumer of the pre-order traversal can abort the search on an entire subtree.

Here's the code I came up with, including a nose test case. The test fails, as described below.

class IgnoreSubtree(Exception):
    pass    

def preorder_traversal(obj):
    try:
        yield obj
    except IgnoreSubtree:
        return

    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            iterator = preorder_traversal(v)
            for o in iterator:
                try:
                    yield o
                except IgnoreSubtree as e:
                    try:
                        iterator.throw(e)
                    except StopIteration:  # WHY?
                        pass

def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2}
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == {'c': 3}:
            iterator.throw(IgnoreSubtree)

    eq_([obj, {'c': 3}, 2], out)

The test fails with the following error:

AssertionError: [{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2] !=
                [{'a': {'c': 3}, 'b': 2}, {'c': 3}]

i.e. the IgnoreSubtree has also aborted the iteration over k/v pairs in the top-level object, it hasn't just pruned out the {'c': 3} subtree.

What's wrong with this code? And why is StopIteration being thrown in the commented location above? Is this a reasonable way to implement subtree pruning for this function, or is there a better way to go about it?

danvk
  • 15,863
  • 5
  • 72
  • 116

6 Answers6

1

As audiodude mentioned, your iterator.throw(IgnoreSubtree) returns iterator's next value (glossing over the complicated exception handling for a moment), so it is consuming the 2 that you were expecting to see appended to out on the next iteration of the loop in test_ignore_subtree.

You also asked about why the StopIteration is being thrown; the sequence of Exceptions being thrown/caught is:

  • iterator.throw(IgnoreSubtree) throws an IgnoreSubtree that is caught in the inner loop of preorder_traversal
  • that IgnoreSubtree is routed into the inner iterator with iterator.throw(e)
  • the IgnoreSubtree is caught at except IgnoreSubtree: and return is called; however, iterator.throw(e) wants to get the next value from that inner iterator, which has just returned, hence a StopIteration is raised.
  • before the original iterator.throw(IgnoreSubtree) returns, it goes once more through the outer loop in preorder_traversal, because it wants to return the next value from iterator.

I hope this helps!

Update

Here is an implementation of this basic scheme that I would use, along with passing nosetest:

from nose.tools import eq_

def preorder_traversal(obj, ignore_only_descendents_of=None, ignore_subtrees=None):
    if ignore_subtrees and obj in ignore_subtrees:
        return

    yield obj

    if ignore_only_descendents_of and obj in ignore_only_descendents_of:
        return

    if isinstance(obj, dict):
        for k, v in iter(sorted(obj.iteritems())):
            iterator = preorder_traversal(v, ignore_only_descendents_of, ignore_subtrees)
            for o in iterator:
                yield o


def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2, 'd': {'e': {'f': 4}}, 'g': 5, 'h': 6}
    ignore_only_descendents_of = [{'e': {'f': 4}}]
    ignore_subtrees = [{'c': 3}, 5]

    iterator = preorder_traversal(obj, ignore_only_descendents_of, ignore_subtrees)
    out = []

    for o in iterator:
        out.append(o)

    expected = [obj, 2, {'e': {'f': 4}}, 6]
    eq_(expected, out)

Things to note:

  • your example allows for excluding the descendents of {'c':3} while including {'c':3} itself; I found this a little confusing, as I would expect you to typically want to exclude a whole subtree including its root, so I've changed preorder_traversal to take two optional lists of things to exclude in each manner.
  • moving the subtrees to skip into the iterator itself seems cleaner; you can avoid using Exceptions to control flow altogether.
  • more complicated example demonstrating the two types of subtree-exclusions.
Ryan Williams
  • 173
  • 1
  • 2
  • 13
1

First of all, why is StopIteration raised. Your definition of preorder_traversal starts with:

try:
    yield obj
except IgnoreSubtree:
    return

In a generator a plain return statement is equivalent to raise StopIteration. In python3.3+ you can actually use return value and it is equivalent to raise StopIteration(value).

So, you are throwing in a certain exception, it is being caught by the generator which executes the return and hence raises a StopIteration. Whenever you call send, next or throw a StopIteration may be raised if the generator ends its execution without finding a yield, so the code you are using in the test is doomed to raise a StopIteration whenever skipping a subtree will end the iteration.

In other words your test is flawed because the throw call can raise an exception even if you have a correct implementation of your generator. So you should either wrap that call in a try statement:

try:
    iterator.throw(IgnoreSubtree)
except StopIteration:
    break

Alternatively you can use the suppress context manager to suppress the StopIteration:

with suppress(StopIteration):
    for o in iterator:
        ...
        iterator.throw(IgnoreSubtree)

If you aren't using python3.4 you can easily reimplement this context manager using the @contextmanager decorator (it's available since python 2.6):

def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

Your code is basically correct. If you were using python3.3+ you could simplify it to:

def preorder_traversal(obj):
    try:
        yield obj
    except IgnoreSubtree:
        return
    else:
        if isinstance(obj, dict):
            for k, v in obj.items():
                yield from preorder_traversal(v)

Your implementation doesn't raise any error for me, once the StopIteration is suppressed for the outer throw. Also the result is what you expect. Unfortunately without the yield from I don't see any way to simplify the control flow.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Thanks for the answer! I'm using Python 2.x, so I can't use `yield from`. Here's a working version with tests that follows your suggestions: https://gist.github.com/danvk/ec02c91d43c8f2edbb54. In practice, the fact that `.throw()` returns the next value makes this an awkward interface to work with. I wound up passing around a callback function instead. – danvk Oct 06 '14 at 20:55
  • I'm confused by this answer because the original code already suppresses the `StopIteration` exception. You change `pass` to `break`, but they do the same thing; pass allows the `for` loop to continue, but the iterator is already exhausted, so it stops on its own. I think the actual answer to this problem is [much simpler](http://stackoverflow.com/a/26225138/577088). – senderle Oct 06 '14 at 21:37
1

I think the other answers are overcomplicating things. The generator is correct! The problem is this line in the test:

iterator.throw(IgnoreSubtree)

Instead, it should be this:

out.append(iterator.throw(IgnoreSubtree))

The iterator is behaving just as expected. But as others have noted, it.throw returns the next value. You're throwing out the value that follows the subtree pruning because you aren't saving the result of throw in your test! You'd also actually need to catch StopIteration in case sending IgnoreSubtree ends the iterator completely. But it looks to me like no other changes are required.

Here's the code that shows the difference:

def test_ignore_subtree(obj, halt, expected):
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == halt:
            out.append(iterator.throw(IgnoreSubtree))

    print expected
    print out

def test_ignore_subtree_wrong(obj, halt, expected):
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == halt:
            iterator.throw(IgnoreSubtree)

    print expected
    print out

print "Test 1"
obj = {'a': {'c': 3}, 'b': 2}
halt = {'c': 3}
expected = [obj, {'c': 3}, 2]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 2"
obj = {'a': {'c': 3, 'd': 4}, 'b': 6, 'c': 5, 'd': 7}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 5, 6, 7]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 3"
obj = {'a': {'c': 3, 'd': 4}, 'b': 2}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 2]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 4"
obj = {'a': {'c': 3, 'd': 4}, 'b': 2, 'c': 5, 'd': 7}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 5, 2, 7]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)

And the output (note how the first value after the pruned part of the tree is missing for all the "wrong" outputs:

Test 1
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}]
Test 2
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 6, 7]
Test 3
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3]
Test 4
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 2, 7]

I had to massage the expected values to get the sequence right because dictionaries iterate in an unpredictable order. But I think the results are sound.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • @ryan-williams pointed out a bug in this approach: what if the value returned by `iterator.throw(IgnoreSubtree)` is also `halt`? Then you'd want to `throw` down another `IgnoreSubtree`. But what if the result of that is also `halt`? It gets complicated. The messiness of this makes me feel increasingly strongly that a filter callback function is the better way to do this. – danvk Oct 06 '14 at 21:43
  • Aha, you're right. But I don't see that it's that much more complicated. The generator itself is still fundamentally correct. – senderle Oct 06 '14 at 21:45
  • I'd like to see the consumer that's not "much more complicated". Unless you can somehow stuff the value returned by `throw` back into the iterator, I don't see how you can correctly consume the output of this generator. – danvk Oct 06 '14 at 21:48
  • Then I think I don't understand what you're saying. `halt` here is just an arbitrary value passed for the purpose of testing. This isn't how it would actually be used. It just stands in for whatever condition the consumer decides on as the halting condition. Say the iteration is going on for too long; the consumer keeps a time count and when it exceeds a certain value, the consumer sends `IgnoreSubtree`. – senderle Oct 06 '14 at 21:53
  • @danvk, I misunderstood your first question to be about what happens if `IgnoreSubtree` exhausts the whole iterator, leading to a `StopIteration` exception. That's what my "aha" was about, and indeed it does require a bit more logic to handle. But you're saying something else that I haven't understood I think. – senderle Oct 06 '14 at 21:56
  • I'm not looking for a way to halt the entire traversal—you can do that via `iterator.close()` with no additional code required. I want to prune the search tree in a precise, semantically-meaningful way. – danvk Oct 06 '14 at 21:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/62547/discussion-between-senderle-and-danvk). – senderle Oct 06 '14 at 21:56
1

Implementing subtree pruning by throwing an exception to the iterator results in messy, error-prone code, both in the generator and in the function that consumes it. Looking at some of the answers here makes me think that a filter callback function is a saner approach.

This would be a generalization of Ryan's answer:

def preorder_traversal(obj, bailout_fn=None):
    yield obj
    if bailout_fn and bailout_fn(obj):
        return

    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for o in preorder_traversal(v, bailout_fn):
                yield o

And here are some nose tests that demonstrate how it's used:

def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2}
    eq_([obj, {'c': 3}, 3, 2], list(preorder_traversal(obj)))

    iterator = preorder_traversal(obj, lambda o: o == {'c': 3})
    out = list(iterator)
    eq_([obj, {'c': 3}, 2], out)


def test_ignore_subtree2():
    obj = {'a': {'c': 3, 'd': 4}, 'b': 2}
    eq_([obj, {'c': 3, 'd': 4}, 3, 4, 2],
            list(preorder_traversal(obj)))

    iterator = preorder_traversal(obj, lambda o: o == {'c': 3, 'd': 4})
    out = list(iterator)

    eq_([obj, {'c': 3, 'd': 4}, 2], out)
Community
  • 1
  • 1
danvk
  • 15,863
  • 5
  • 72
  • 116
1

I've never posted a second answer before, but I think in this case it's appropriate. The first section of this answer discusses a way to cleaning up the generator interface. The second section discusses when it's most appropriate to use this fix, and when it's more appropriate to replace throw with another construct.

Cleaning the Interface

There are two key problems with the generator as it stands. Neither of them have to do with correctness -- it behaves as expected. They have to do with interface. So fix the interface problems with a wrapper function.

The first problem is that throw returns an important value that the current test discards. So write a wrapper that returns an unimportant value when IgnoreSubtree is called. And the second problem is that when IgnoreSubtree is thrown, sometimes it exhausts the iterator completely. So write a wrapper that catches StopIteration and handles it gracefully. This does both:

def ptv_wrapper(obj):
    pt = preorder_traversal(obj)
    while True:
        try:
            o = pt.next()
            while True:
                try:
                    yield o
                except IgnoreSubtree as e:
                    yield
                    o = pt.throw(e)
                else:
                    break

        except StopIteration:
            return

Your above code will work as-is if you use the above as a wrapper around preorder_traversal.

When to Use throw; When to Use Callbacks; When to Use send

The question of whether to use throw in this case is a difficult one. As danvk has pointed out, this exception-based method uses some pretty complex (and exotic) techniques, and the extra complexity may not be worth it. Additionally, there's something a little fishy about using exceptions for control flow. Generators already do it internally (using StopIteration) so there must be some justification, but it's worth thinking about what that justification is.

The first question is whether using throw increases or decreases the complexity of the code that already exists. If your use-case does not involve tight coupling between the generator and the consumer, then you're probably better off using callbacks. (And if your code is tightly coupled but doesn't have to be, you should refactor!) However, in some cases, tight coupling is unavoidable. In those cases, using throw (or send -- see below) probably doesn't increase complexity, and might decrease it. Indeed, if you use callbacks in a situation where those callbacks depend on a lot of external state to do what they need to do, then you are probably writing code that has tight coupling and low cohesion -- the worst of both worlds! By using throw or send, you're ensuring that the generator and the state that controls it are close together; coupling will be high, but so will cohesion, which will probably result in less-complex code.

The second question is whether to use throw or send. The second option ought to be under consideration here, because it's another way for the consumer to signal the generator that something's up. You might think of send as the LBYL to throw's EAFP. Doing so helps provide some intuitions about when to use one or the other. It depends on whether you anticipate passing signals back and forth between generator and consumer frequently. EAFP code that rarely throws exceptions will generally be faster than corresponding LBYL code. But EAFP code that frequently throws exceptions will be much slower than the corresponding LBYL code. This helps to explain why Python iterators use StopIterator instead of tests: in the overwhelming majority of cases, StopIterator will only ever be thrown once! So the cost of that catch and throw becomes fixed overhead that is quickly overwhelmed by other performance bottlenecks.

This means that if your use of IgnoreSubtree is infrequent (more like Python's use of StopIterator) then you're probably justified in using throw. Otherwise, consider using send.

Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
0

I would direct you to the docs for generator.throw: https://docs.python.org/2/reference/expressions.html#generator.throw

To quote:

Raises an exception of type type at the point where generator was paused, and returns the next value yielded by the generator function. If the generator exits without yielding another value, a StopIteration exception is raised.

There's no way to "prune out" the {'c': 3} subtree using generator.throw, because the value has already been generated by the time you can do a comparison with it. Additionally, the documentation of generator.throw tells you that it tries to yield a "final" value, so to speak, or else it raises StopIteration if there is no final value that is yielded.

audiodude
  • 1,865
  • 16
  • 22
  • It's fine that `{'c': 3}` is generated—I'd just like to send a message that says "don't look below this node". Hence the importance of this being pre-order traversal. – danvk Oct 06 '14 at 20:35
  • I've written a test which illustrates that this solution does not work here: https://gist.github.com/danvk/925f6cc25df1c9583157 – danvk Oct 06 '14 at 20:36
  • Okay good to know, like I said it was just a hunch, I will edit the answer to remove it. – audiodude Oct 06 '14 at 20:55