0

I am following the BinarySearchTree code in the book Data Structure and Algorithms. Would you like to read the full code in this link?

And I am not clear how this method works

def __iter__(self):
   if self.left != None:
      for elem in self.left:
         yield elem
                    
      yield self.val
            
   if self.right != None:
      for elem in self.right:
         yield elem
  1. Is the elem variable an instance of the Node class or is it a float number (from inputs)? In debug it is both, I guess this value is changed because of line yield elem but I do not understand it.

  2. What are the differences between yield elem and yield self.val? How many generator objects are there in this situation?

  3. In addition, would you like to share some experience in debugging generator functions? I am confused by yield when debugging.

Tan Phan
  • 337
  • 1
  • 4
  • 14

2 Answers2

2

1. elem is a Node instance. From the for loops, we know that elem is always either self.left or self.right. You can see in the example usage that float values are inserted into the binary tree with tree.insert(float(x)) and the BinarySearchTree.insert() method ultimately calls BinarySearchTree.Node(val) where val is float(x) in this case. Therefore self.left and self.right are always Node instances.

  1. As mentioned by don't talk just code in the comments, elem is a float. I did not see this before because I assumed that iterating over self.left would product a list of Node elements. However this is not correct. In fact, iterating over self.left works in this case by calling self.left.__iter__(). I break down this __iter__() function into 3 cases, almost like a recursive function. (It is not in fact recursive because it is calling the __iter__() method of different instances of the Node class, but its behavior is similar.)

    • First, the Node has no left or right children. This is straightforward: the iter will just yield self.val, which is a float.
    • Second, the Node has left children. In this case, the for loop will traverse down all the left children in an almost recursive fashion until it reaches a Node that has no left children. Then we are back at the first case.
    • Third, the Node has right children. In this case, after the own nodes self.val is return, the iterator will continue to the first right node, and repeat.
  2. There is only one generator, namely Node.__iter__(), because generators are functions. It uses multiple yield statements to return different values depending on the situation. yield elem and yield self.val just return either a Node if the current Node has left or right branches or the current Node's value.

  3. I do not have specific tips for debugging yield statements in particular. In general I use IPython for interactive work when building code and use its built-in %debug magic operator. You might also find rubber duck debugging useful.

Using IPython you can run the following in a cell to debug interactively.

In [37]: %%debug
    ...: for x in tree.root:
    ...:     print(x)
    ...:
NOTE: Enter 'c' at the ipdb>  prompt to continue execution.

You can then use the s command at the debugger prompt, ipdb> , to step through the code, jumping into a function calls.

ipdb> s
--Call--
> <ipython-input-1-c4e297595467>(30)__iter__()
     28         # of the nodes of the tree yielding all the values. In this way, we get
     29         # the values in ascending order.
---> 30         def __iter__(self):
     31             if self.left != None:
     32                 for elem in self.left:

While debugging, you can evaluate expressions by preceding them with an exclamation point, !.

ipdb> !self
BinarySearchTree.Node(5.5,BinarySearchTree.Node(4.4,BinarySearchTree.Node(3.3,BinarySearchTree.Node(2.2,BinarySearchTree
.Node(1.1,None,None),None),None),None),None)
ogdenkev
  • 2,264
  • 1
  • 10
  • 19
  • "`elem` is a `Node` instance." - It clearly isn't. – no comment Oct 01 '21 at 11:11
  • They ask "How many generator **objects**", rather likely meaning the iterators, not the function (They also do say "generator functions" later on). – no comment Oct 01 '21 at 11:17
  • @don'ttalkjustcode would you elaborate on why *elem* is not a *Node* instance? – ogdenkev Oct 01 '21 at 11:27
  • Because it's a `self.val` from a deeper iteration. – no comment Oct 01 '21 at 11:29
  • @don'ttalkjustcode there is imprecision in the language used here. "Objects are Python’s abstraction for data" (from [here](https://docs.python.org/3/reference/datamodel.html)). On that page it lists [generator functions](https://docs.python.org/3/library/stdtypes.html#generator-types). And the return value of generator functions is a [generator iterator](https://docs.python.org/3/glossary.html#term-generator-iterator). Thus I interpreted "generator object" to be the "generator function" – ogdenkev Oct 01 '21 at 11:32
  • @don'ttalkjustcode No, the `elem` object is only used in the `for` loops, which iterate over elements in `self.left` or `self.right`. Yes, the generator iterator will yield `self.val` after going through every element in `self.left`, but `self.val` is never assigned to `elem`. – ogdenkev Oct 01 '21 at 11:34
  • If you don't see it by reading the code, then just execute it with an added `print(type(elem))` and you'll see it's not a `Node` instance. – no comment Oct 01 '21 at 11:51
  • @don'ttalkjustcode, you were right. I executed the code as you suggested which revealed I was thinking about this too superficially. I have edited my answer to strike out my incorrect answer. – ogdenkev Oct 01 '21 at 12:31
  • Interesting thought about recursiveness. I still think the function is recursive, though. As [the documentation](https://docs.python.org/3/reference/datamodel.html) says: *"When an instance method object is called, the underlying function (__func__) is called, inserting the class instance (__self__) in front of the argument list."*. – no comment Oct 01 '21 at 13:34
  • Thanks for that quote from the documentation. I agree with you that the function looks recursive. Indeed, if I bind a new function (see [here](https://stackoverflow.com/questions/8980676/dynamically-bind-method-to-class-instance-in-python) or [here](https://stackoverflow.com/questions/972/adding-a-method-to-an-existing-object-instance)) to the `__iter__()` method of an instance of Node, it looks like the class's `__iter__()` is still called with the instance, e.g. `Node.__iter__(mynode)` – ogdenkev Oct 01 '21 at 14:46
2

First, there is an indentation issue in the code you shared: yield self.val should not be in the if block:

def __iter__(self):
   if self.left != None:
      for elem in self.left:
         yield elem
                    
   yield self.val  #  Unconditional. This is also the base case
            
   if self.right != None:
      for elem in self.right:
         yield elem

To understand this code, first start imagining a tree with just one node. Let's for a moment ignore the BinarySearchTree class and say we have direct access to the Node class. We can create a node and then iterate it:

node = Node(1)
for value in node:
    print(value)

This loop will call the __iter__ method, which in this case will not execute any of the if blocks, as it has no children, and only execute yield self.val. And that is what value in the above loop will get as value, and which gets printed.

Now extend this little exercise with 2 more nodes:

node = Node(1, 
    Node(0),
    Node(2)
)
for value in node:
    print(value)

Here we have created this tree, and node refers to its root

     1   <-- node
    / \
   0   2

When the for..in loop will call __iter__ now, it will first enter the first if block, where we get a form of recursion. With the for statement there, we again execute __iter__, but this time on the left child of node, i.e. the node with value 0. But that is a case we already know: this node has no children, and we know from the first example above, that this results in one iteration where the loop variable will be the value of that node, i.e. 0, and that value is yielded. That means the main program gets an iteration with value equal to 0, which gets printed.

So elem is numeric. It would better have been called value or val to take away any confusion.

After that if block has executed we get to yield self.val. self is here node, and so we yield 1. That means the main program gets to execute a second iteration, this time with value equal to 1.

Finally the second if block is executed, and now the right child of node is the subject of a recursive __iter__ call. It is the same principle as with the left child. This yields value 2, and the main program prints 2.

We could again extend the tree with more nodes, but the principle is the same: by recursive calls of __iter__ all the values of the tree are yielded.

yield from

There is a syntax that allows simplification of the code, and also it is more common practice to use the is operator when comparing with None:

def __iter__(self):
   if self.left is not None:
      yield from self.left
                    
   yield self.val
            
   if self.right is not None:
      yield from self.right

This results in the same behavior. yield from will yield all values that come from the iterable. And since node instances are iterable as they have the __iter__ method, this works as intended.

trincot
  • 317,000
  • 35
  • 244
  • 286
  • Thank you for your comment. Why is elem **val** attribute? the *Node* Class has other as **left** and **right** – Tan Phan Oct 02 '21 at 05:08
  • Do you know any special way to debug for **yield statement** in code? – Tan Phan Oct 02 '21 at 05:10
  • It yields `val` because it is literally there in the code. It is the base case. All yields translate to that base case. I thought my answer took it step by step to explain how it comes to yield the values. – trincot Oct 02 '21 at 06:03
  • To debug, just `print(elem)` before the `yield elem`. For a big tree, you will then see the same value being printed multiple times, because one `yield` will cause `yield` to happen in the whole path of the recursion tree. – trincot Oct 02 '21 at 07:00