0

I am trying to implement a kind of sorted dictionary as a Binary Search Tree. The idea is that no matter what operation I do on this new dictionary-like object, the elements of the dictionary are always sorted with respect to the keys. My implementation is not complete - there are some issues of performance that I would like to fix before completing the code.

The idea is to create two classes, Nodeand SortedDict().

The class Node has the __getitem__ and __setitem__ methods to insert and get elements in the tree (for the moment I am not implementing a delete operation).

The class SortedDict() takes a sorted dictionary (for the moment I am not implementing cases where the argument is not a sorted dictionary), and it starts inserting the elements of the dictionary starting from the leaves of the tree. First, it processes the left child until it reaches the median of the keys of the dictionary. Then, it processes the right child, and then it glues the two children trees to the root. The root is the median of the sorted dictionary' keys.

If the dictionary is very large, I get a RecursionError: maximum recursion depth exceeded in comparison error if I try to access an element that is very far from the root. Why do I get this error? How can I avoid it?

The traceback is:

Traceback (most recent call last):
  File "main.py", line 96, in <module>
    print(dic_obj[5])
  File "main.py", line 91, in __getitem__
    return self.root.__getitem__(item)
  File "main.py", line 26, in __getitem__
    return self.left.__getitem__(item)
  File "main.py", line 26, in __getitem__
    return self.left.__getitem__(item)
  File "main.py", line 26, in __getitem__
    return self.left.__getitem__(item)
  [Previous line repeated 994 more times]
  File "main.py", line 18, in __getitem__
    if item > self.root:
RecursionError: maximum recursion depth exceeded in comparison

To reproduce this error you can use the following code:

dic = {k:k for k in range(1,10000)}
dic_obj = SortedDict(dic)
print(dic_obj[5])    

where the definition of SortedDict is given as follow:

class Node():
    def __init__(self, root=None, value=None, left=None, right=None, parent=None):
        self.parent = parent
        self.root, self.value = root, value
        self.left = left
        self.right = right

    def __str__(self):
        return f'<Node Object - Root: {self.root}, Value: {self.value}, Parent: {self.parent}>'

    def __getitem__(self, item):

        if self.root is None:
            raise KeyError(item)

        if item > self.root:
            if self.right is None:
                raise KeyError(item)
            return self.right.__getitem__(item)

        if item < self.root:
            if self.left is None:
                raise KeyError(item)
            return self.left.__getitem__(item)

        if item == self.root:
            return self.value

    def __setitem__(self, key, value):
        if self.root is None:
            self.root, self.value = key, value
        else:
            if key > self.root:
                if self.right is not None:
                    self.right.__setitem__(key,value)
                else:
                    self.right = Node(root=key, value=value)
                    self.right.parent = self
            elif key < self.root:
                if self.left is not None:
                    self.left.__setitem__(key,value)
                else:
                    self.left = Node(root=key, value=value)
                    self.left.parent = self
            elif key == self.root:
                self.root = value


class SortedDict():
    def __init__(self, array: dict):
        self.root = Node()
        if array:
            keys = list(array.keys())
            for key in range(len(keys)//2):
                self.__setitem__(keys[key],array[keys[key]])
            root = Node(root=keys[key+1],value=array[keys[key+1]])
            self.root.parent = root
            root.left = self.root
            self.root = Node()
            for key in range(len(keys)//2+1,len(keys)):
                self.__setitem__(keys[key],array[keys[key]])
            self.root.parent = root
            root.right = self.root
            self.root = root


    def __setitem__(self, key, value):
        try:
            if key > self.root.root:
                if self.root.right is None and self.root.left is None:
                    node = Node(root=key, value=value)
                    self.root.parent = node
                    node.left = self.root
                    self.root = node
                else:
                    if self.root.right is None:
                        self.root.right = Node(root=key, value=value, parent=self.root)
                    else:
                        node = Node(root=key, value=value)
                        self.root.parent = node
                        node.left = self.root
                        self.root = node
        except:
            self.root.root = key
            self.root.value = value


    def __getitem__(self, item):
        return self.root.__getitem__(item)
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
apt45
  • 412
  • 1
  • 6
  • 14
  • (1) dict is compiled code and therefore naturally faster than Python code. (2) The root node is the median but its left and right subtrees aren't balanced but for a sorted dict as input they are fully degenerated (graphically the tree looks like a "V" upside down). – Michael Butscher Jan 15 '23 at 22:09
  • @MichaelButscher I edited the code to implement a balanced binary search tree from a sorted dictionary. I do get an error when I try to access one of the very first (or very last) elements of a very large dictionary. Can you have a look at it? – apt45 Jan 16 '23 at 14:54
  • Please read https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ and try to trace through the execution of the code. My initial guess was wrong, but I still don't think that the tree creation actually works the way you think it does. Try making some smaller trees and see if you can add code that actually shows you the tree structure, then see what they look like. – Karl Knechtel Jan 16 '23 at 15:14
  • But in any event, we don't debug code here - it's your responsibility, before posting, to figure out **where** a problem occurs, in order to create a [mre] - emphasis on minimal. – Karl Knechtel Jan 16 '23 at 15:15
  • @KarlKnechtel I am not asking you to debug my code. The code works fine for small dictionaries (all the methods work fine, and the binary tree gets created correctly). The problem and the question here are about the algorithm. I thought the example I provided was minimal... very minimal. – apt45 Jan 16 '23 at 15:20
  • @KarlKnechtel why do you think that the tree creation doesn't work the way I think it does? – apt45 Jan 16 '23 at 15:21
  • Because you seem to expect a balanced tree as a result, and I do not. I think you think this because you wrote code to split the input for the constructor in half, but **that is not enough**. "The code works fine for small dictionaries (all the methods work fine, and the binary tree gets created correctly)" Write the code that I described that will show you the structure of the trees. I am confident you will notice that there is a problem exactly like what @MichaelButscher described. – Karl Knechtel Jan 16 '23 at 15:23
  • 1
    For that matter, just try adding something to report the *depth* of the tree. You would agree with me that the depth of a *balanced* BST with, say, 100 nodes, should be, say, closer to 10 than to 50, yes? – Karl Knechtel Jan 16 '23 at 15:24
  • @KarlKnechtel I agree my implementation does not generate a balanced tree. I am going to fix it now. Thanks for the hint. So, if I understand right, if I create a correct balanced tree I should not get a `RecursionError`error. Is that correct? – apt45 Jan 16 '23 at 16:02
  • The error occurs because of the depth of recursion - it's meant to catch unbounded recursion, but also protects against blowing up the stack in general (Python doesn't implement tail call optimization). The default maximum depth is something like 500 or 1000 depending on platform; a balanced binary tree shouldn't reach that depth (2^500 nodes will definitely not fit in memory). – Karl Knechtel Jan 17 '23 at 01:17
  • See: [What is the maximum recursion depth in Python, and how to increase it?](https://stackoverflow.com/questions/3323001/) – Karl Knechtel Jan 17 '23 at 01:18

0 Answers0