0

this is a possible way to implement add and delete functions for a BST in Python. It somewhat resembles my ideas of BST in C++. As evident by the code for deleting, I want to delete a node, which I cannot do due to lack of pass by reference in Python as it is present in C++. What is a good alternative way to delete a node apart from del currNode(this does not work). I have one more question to clarify my idea about pass by reference in Python, when a node is being using added using add function, it remains "attached" to the root node, when add is being called recursively. However, when a node is being deleted, it is not being "detached" from the root node. Why is this so?

class node(object):
    def __init__(self, data = None):
        self.data = data
        self.left = None
        self.right = None

class bst(object):
    def __init__(self):
        self.root = None
    
    def add(self, value):
        
        def _add(self, currNode, value):
            if value < currNode.data:
                if currNode.left == None:
                    currNode.left = node(value)
                else:
                    _add(self, currNode.left, value)

            elif value > currNode.data:
                if currNode.right == None:
                    currNode.right = node(value)
                else:
                    _add(self, currNode.right, value)
        
            else:
                print("Duplicate found")

        
        if self.root == None:
            self.root = node(value)
        else:
            _add(self, self.root, value)


    def printBST(self):
        def _printBST(self, currNode):
            if currNode!= None:
                _printBST(self, currNode.left)
                print(currNode.data, end = " ")
                _printBST(self, currNode.right)
        
        if self.root != None:
            _printBST(self,self.root)
    

    def minBST(self,currNode):
        def _minBST(self, currNode):
            if currNode.left == None:
                return currNode.data
            else: return _minBST(self, currNode.left)
        
        if currNode != None:
            return _minBST(self, currNode)
        else:
            return -10000
    
    def deleteValue(self, val):
        
        def deleteNode(self, currNode, value):
            if currNode == None: 
                return
            elif value > currNode.data:
                return deleteNode(self, currNode.right, value)
            elif value < currNode.data:
                return deleteNode(self, currNode.left, value)
            else:
                if currNode.left == None and currNode.right == None:
                    #del currNode
                    currNode.data = None

                elif currNode.right == None:
                    currNode.data = None
                    #The address of currNode does not change
                    #as it happens in C++
                    #currNode = currNode.left

                elif currNode.left == None:
                    currNode.data = None
                    #The address of currNode does not change
                    #currNode = currNode.right
            
                else:
                    minV = self.minBST(currNode.right)
                    currNode.data = minV
                    return deleteNode(self, currNode.right, minV)

        deleteNode(self, self.root, val)
    


if __name__ == '__main__':
    b = bst()
    b.add(50)
    b.add(60)
    b.add(40)
    b.add(30)
    b.add(45)
    b.add(55)
    b.add(100)
    b.printBST()
    b.deleteValue(100)
    print("")
    b.printBST()
 
motiur
  • 1,640
  • 9
  • 33
  • 61

2 Answers2

3

node structure and insertion

We start with a simple node structure, but notice the left and right properties can be set at the time of construction -

# btree.py

class node:
  def __init__(self, data, left=None, right=None):
    self.data = data
    self.left = left
    self.right = right

Recursion is a functional heritage and so using it with functional style yields the best results. This means avoiding things like mutation, variable reassignment, and other side effects. Notice how add always constructs a new node rather than mutating an old one. This is the reason we designed node to accept all properties at time of construction -

# btree.py (continued)

def add(t, q):
  if not t:
    return node(q)
  elif q < t.data:
    return node(t.data, add(t.left, q), t.right)
  elif q > t.data:
    return node(t.data, t.left, add(t.right, q))
  else:
    return node(q, t.left, t.right)

inorder traversal and string conversion

After we add some nodes, we need a way to visualize the tree. Below we write an inorder traversal and a to_str function -

# btree.py (continued)

def inorder(t):
  if not t: return
  yield from inorder(t.left)
  yield t.data
  yield from inorder(t.right)


def to_str(t):
  return "->".join(map(str,inorder(t)))

btree object interface

Notice we did not over-complicate our plain functions above by entangling them with a class. We can now define a btree object-oriented interface which simply wraps the plain functions -

# btree.py (continued)

class btree:
  def __init__(self, t=None): self.t = t
  def __str__(self): return to_str(self.t)
  def add(self, q): return btree(add(self.t, q))
  def inorder(self): return inorder(self.t)

Notice we also wrote btree.py as its own module. This defines a barrier of abstraction and allows us to expand, modify, and reuse features without tangling them with other areas of your program. Let's see how our tree works so far -

# main.py

from btree import btree

t = btree().add(50).add(60).add(40).add(30).add(45).add(55).add(100)

print(str(t))
# 30->40->45->50->55->60->100

minimum and maximum

We'll continue working like this, defining plain function that work directly on node objects. Next up, minimum and maximum -

# btree.py (continued)

from math import inf

def minimum(t, r=inf):
  if not t:
    return r
  elif t.data < r:
    return min(minimum(t.left, t.data), minimum(t.right, t.data))
  else:
    return min(minimum(t.left, r), minimum(t.right, r))

def maximum(t, r=-inf):
  if not t:
    return r
  elif t.data > r:
    return max(maximum(t.left, t.data), maximum(t.right, t.data))
  else:
    return max(maximum(t.left, r), maximum(t.right, r))

The btree interface provides only a wrapper of our plain functions -

# btree.py (continued)

class btree:
  def __init__():       # ...
  def __str__():        # ...
  def add():            # ...
  def inorder():        # ...
  def maximum(self): return maximum(self.t)
  def minimum(self): return minimum(self.t)

We can test minimum and maximum now -

# main.py

from btree import btree

t = btree().add(50).add(60).add(40).add(30).add(45).add(55).add(100)

print(str(t))
# 30->40->45->50->55->60->100

print(t.minimum(), t.maximum())     # <-
# 30 100

insert from iterable

Chaining .add().add().add() is a bit verbose. Providing an add_iter function allows us to insert any number of values from another iterable. We introduce it now because we're about to reuse it in the upcoming remove function too -

def add_iter(t, it):
  for q in it:
    t = add(t, q)
  return t
#main.py

from btree import btree

t = btree().add_iter([50, 60, 40, 30, 45, 55, 100])   # <-

print(str(t))
# 30->40->45->50->55->60->100

print(t.minimum(), t.maximum())
# 30 100

node removal and preorder traversal

We now move onto the remove function. It works similarly to the add function, performing a t.data comparison with the value to remove, q. You'll notice we use add_iter here to combine the left and right branches of the node to be deleted. We could reuse inorder iterator for our tree here, but preorder will keep the tree a bit more balanced. That's a different topic entirely, so we won't get into that now -

# btree.py (continued)

def remove(t, q):
  if not t:
    return t
  elif q < t.data:
    return node(t.data, remove(t.left, q), t.right)
  elif q > t.data:
    return node(t.data, t.left, remove(t.right, q))
  else:
    return add_iter(t.left, preorder(t.right))

def preorder(t):
  if not t: return
  yield t.data
  yield from preorder(t.left)
  yield from preorder(t.right)

Don't forget to extend the btree interface -

# btree.py (continued)

class btree:
  def __init__():       # ...
  def __str__():        # ...
  def add():            # ...
  def inorder():        # ...
  def maximum():        # ...
  def minimum():        # ...
  def add_iter(self, it): return btree(add_iter(self.t, it))
  def remove(self, q): return btree(remove(self.t, q))
  def preorder(self): return preorder(self.t)

Let's see remove in action now -

# main.py

from btree import btree

t = btree().add_iter([50, 60, 40, 30, 45, 55, 100])

print(str(t))
# 30->40->45->50->55->60->100

print(t.minimum(), t.maximum())
# 30 100

t = t.remove(30).remove(50).remove(100)      # <-

print(str(t))
# 40->45->55->60

print(t.minimum(), t.maximum())
# 40 60

completed btree module

Here's the completed module we built over the course of this answer -

from math import inf

class node:
  def __init__(self, data, left=None, right=None):
    self.data = data
    self.left = left
    self.right = right

def add(t, q):
  if not t:
    return node(q)
  elif q < t.data:
    return node(t.data, add(t.left, q), t.right)
  elif q > t.data:
    return node(t.data, t.left, add(t.right, q))
  else:
    return node(q, t.left, t.right)

def add_iter(t, it):
  for q in it:
    t = add(t, q)
  return t

def maximum(t, r=-inf):
  if not t:
    return r
  elif t.data > r:
    return max(maximum(t.left, t.data), maximum(t.right, t.data))
  else:
    return max(maximum(t.left, r), maximum(t.right, r))

def minimum(t, r=inf):
  if not t:
    return r
  elif t.data < r:
    return min(minimum(t.left, t.data), minimum(t.right, t.data))
  else:
    return min(minimum(t.left, r), minimum(t.right, r))

def inorder(t):
  if not t: return
  yield from inorder(t.left)
  yield t.data
  yield from inorder(t.right)

def preorder(t):
  if not t: return
  yield t.data
  yield from preorder(t.left)
  yield from preorder(t.right)

def remove(t, q):
  if not t:
    return t
  elif q < t.data:
    return node(t.data, remove(t.left, q), t.right)
  elif q > t.data:
    return node(t.data, t.left, remove(t.right, q))
  else:
    return add_iter(t.left, preorder(t.right))

def to_str(t):
  return "->".join(map(str,inorder(t)))

class btree:
  def __init__(self, t=None): self.t = t
  def __str__(self): return to_str(self.t)
  def add(self, q): return btree(add(self.t, q))
  def add_iter(self, it): return btree(add_iter(self.t, it))
  def maximum(self): return maximum(self.t)
  def minimum(self): return minimum(self.t)
  def inorder(self): return inorder(self.t)
  def preorder(self): return preorder(self.t)
  def remove(self, q): return btree(remove(self.t, q))

have your cake and eat it too

One understated advantage of the approach above is that we have a dual interface for our btree module. We can use it in the traditional object-oriented way as demonstrated, or we can use it using a more functional approach -

# main.py

from btree import add_iter, remove, to_str, minimum, maximum

t = add_iter(None, [50, 60, 40, 30, 45, 55, 100])

print(to_str(t))
# 30->40->45->50->55->60->100

print(minimum(t), maximum(t))
# 30 100

t = remove(remove(remove(t, 30), 50), 100)

print(to_str(t))
# 40->45->55->60

print(minimum(t), maximum(t))
# 40 60

additional reading

I've written extensively about the techniques used in this answer. Follow the links to see them used in other contexts with additional explanation provided -

Mulan
  • 129,518
  • 31
  • 228
  • 259
  • Thanks for the answer. I am question though, how is yield responsible for deleting a node from the tree. I am referring to the last else clause of the remove function, where preorder is being called. My initial hunch, is that the data stored in the memory is being returned and being deleted at the same time - due to yield. Please do correct me. – motiur Feb 25 '21 at 16:37
  • @motiur, the last clause of the `remove` function will delete the node, `t`, where `t.data` is equal to `q`, but the branches of that node, `t.left` and `t.right` must be preserved. To do this, `add_iter(t.left, preorder(t.right))` is effectively `merge` and creates a new tree. `preorder(t.right)` is used to insert values into `t.left` one at a time. This is similar to how we create a tree in the example, `t = add_iter(None, [50, 60, 40, 30, 45, 55, 100])` but this inserts values from the array `[50, 60, ...]` into the empty tree, `None`. Does that make sense? – Mulan Feb 25 '21 at 16:43
  • So, correct me if I am wrong, so you are merging the left subtree and the right subtree when you called `add_iter` in the last clause of `remove` -- but what happened to the node containing the data -- did it get destroyed/deleted, or you just simply ignored it - and did not return it. – motiur Feb 25 '21 at 17:43
  • this is a key difference with functional techniques: instead of manipulating old (existing) values, a _new_ value is created. In the previous branches, we re-create a new vein down the tree, `node(t.data, remove(t.left, q), t.right)` or `node(t.data, t.left, remove(t.right, q))`. This is not a copy of the entire tree, rather only the _path_ to `t.data == q` is recreated. So when `t.data == q`, node `t` is ignored and garbage collected, and `t.left` and `t.right` are merged to form the new subtree. – Mulan Feb 25 '21 at 17:56
  • If we write `t0 = add_iter(None, [1,2,3])` and then `print(t1)` I will see `1->2->3`. Next we define `t1 = remove(t0, 2)` and `print(t1)` we will see `1->3`. The call to `remove` does *not* change `t0`, so if we write `print(t0)` again, we still see `1->2->3`. This is what it means to be *persistent*. If other parts of your program hold a reference to `t0`, integrity of the tree will always be in tact. Only after all references to a node are cleaned up will the nodes actually be destroyed. – Mulan Feb 25 '21 at 18:09
  • You are already familiar with this concept. For example `x = 10`, and `y = x-1`. If I `print(x,y)` you would expect to see `10,9`. That is because `x-1` does not mutate `x`'s value, 10. Instead a _new_ value, 9, is returned. Likewise `print(x-1, x-1, x-1)` will yield `9,9,9`, _not_ `9,8,7`. – Mulan Feb 25 '21 at 18:30
  • I like the way, he proceed to solve the problem. Just to clarify one more thing, so that we are on the same page - say `b = btree()` and `b.add_iter([1,2,3])` - and after doing `b1=b.remove(3)` , `b1` does not contain `3` but `b` contains `3` along with `1` and `2`. Is it possible to remove `3` from `b` or just having a new tree `b1` is the best thing that can be done in Python - since "pass by reference" is not allowed. – motiur Feb 25 '21 at 18:49
  • The easiest way to overwrite `b` is to write `b = b.remove(3)` just like you would write `x = x - 1`. Since the old reference to `b` is overwritten, the removed node will no longer have a reference to it and it will be garbage collected. Unless you make other references of course, such as `z = b` then `b = b.remove(3)` where `z` still contains a reference to `3` :D – Mulan Feb 25 '21 at 18:57
  • Well , hmm, that would work as well. Odd, in C++, there is so hammering over pass by reference and memory management. Its good to know that automatic garbage collection in Python does a quite bit of work. Thanks. – motiur Feb 25 '21 at 19:03
  • motiur, I will add that Python uses a [call by sharing](https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_sharing) evaluation strategy which is similar to call by reference. When I write `somefunc(someobj)` the input object is not copied. if `somefunc` mutates `someobj` the caller will be able to observe this side effect. – Mulan Feb 25 '21 at 20:07
  • Hi, i have realized that `del` deletes a node in the current context, it does not have a "global" effect like the append function. However, assignment of values do have a global affect, as illustrated by add function that I have written. Do you want to comment on that. – motiur Feb 27 '21 at 03:17
  • `del x` statement deletes and removes the name binding of `x`. If `x` is compound data such as a list or dict, child values are recursively deleted as well. `del` is a side effect and can be been as manual memory management from a functional style point of view, and is therefore avoided. – Mulan Feb 27 '21 at 16:10
  • The question about `add` is not clear to me. Assignment of variables in python have a lexical scope. Assignment of a _property_, like `current_node.left = node(value)` is mutation of an object, and can be seen as `current_node.set_property("left", node(value))`; see [@property](https://docs.python.org/3/library/functions.html#property). As noted, Python uses call by sharing evaluation strategy, so any mutations to a shared object can be observed by the caller. This is not the same as a global variable, but similar. Mutation is a side effect and this is also avoided in functional style. – Mulan Feb 27 '21 at 16:15
1

Instead of deleting the node using del you can reassign the parent node and let the child node be collected by garbage collector.

In deleteNode return the new child instead of deleting the node. Assign the returned value to the parent.

def deleteNode(self, currNode, value):
    if not currNode: 
        return currNode
    elif value < currNode.data:
        return deleteNode(self, currNode.left, value)
    elif value > currNode.data:
        return deleteNode(self, currNode.right, value)

    else: 
        if not currNode.right:
            return currNode.left

        if not currNode.left:
            return currNode.right

        temp_val = currNode.right
        mini_val = temp_val.val
        while temp_val.left:
            temp_val = temp_val.left
            mini_val = temp_val.val
        currNode.right = deleteNode(currNode.right,currNode.val)
    return currNode
Prithvi Raj
  • 1,611
  • 1
  • 14
  • 33