0

I am implementing an AVLTree in C++ as an exercise as preparation for future projects. An AVLTree is basically a BSTTree but with the extra detail that the tree is balanced, i.e., for a node x with left child y and right child z, the number of children on the left of x ( child y and children of y) cannot differ from the number of children on the right of x by more than 1 node.

Each node will represent an int value, and has a reference to the parent node, to a possibly existing left child node and to a possibly existing right child node.

class BSTNode {
protected:
    int value;
    BSTNode* parent;
    BSTNode* left;
    BSTNode* right;
    ...
public:
    virtual void setLeftChild(BSTNode* left); 
    ...
}

To track the number of children of a node, i have AVLNode extending BSTNode with two integers, leftAffinity and rightAffinity, where leftAffinity tells me the number of nodes on the left (similarly for the right with rightAffinity). Since i do not ensure that values i'm adding are always unique, i cannot update affinities before finding a spot to place the new node.

class AVLNode : public BSTNode {
private:
    int leftAffinity;
    int rightAffinity;
    ...
public:
    void setLeftChild(BSTNode* left) override;
    ...
}

Once i successfully set the left child of a node to left, in AVLNode i also update leftAffinity of the current node (and recursively to parent of parent until the root of the tree is reached) to left->getLeftAffinity() + left->getRightAffinity() .

The problem here is that functions on affinity are defined in AVLNode, thus i cannot immediately call left->getLeftAffinity() without a cast, as here i don't know whether left is a BSTNode or an AVLNode. I know that the idea is for any child of an AVLNode to also only be an AVLNode, which could be enforced by ensuring that any BSTNode that is not an AVLNode is transformed into an AVLNode.

  • I do not want to change the function argument to receive AVLNode* instead as this forces me to declare left,right and parent as AVLNode* in the AVLNode class, and thus i get duplicate variables, one of each for BSTNode class and one of each for AVLNode class, even if left,right and parent are private in BSTNode class.
  • I do not want to use dynamic casting, as the point of the exercise was to create an efficient tree data structure with O(log n) complexity on insert and get operations. I've read dynamic casting is both expensive and should also be avoided as it usually hints at a design flaw.

Possible approaches:

  • Create a "no-op" function in the BSTNode class that is overriden in AVLNode class to manage node affinities. This function should not be there as it is not related to BSTNode.
  • Dynamic casting on each add operation, and when succesfull, once for each parent until the root of the tree, which is too expensive.
  • Not use virtual at all for these functions, which would force me to overload many functions in AVLNode.
  • Change all functions to receive AVLNode* instead of BSTNode*, while forcing conversion of BSTNode* into AVLNode* with a constructor in AVLNode, but it this would require dynamic casting as otherwise i would lose leftAffinity and rightAffinity of a node being added, and cause other problems with function inheritance.

The choice that looks best to me is to take the "no-op" approach as it would work if performance is key.

I'm new to C++ so i appreciate any thought on the approaches i've written and for example if there is a trivial solution i overlooked. What would be the best choice in this situation?

Kevin de Vos
  • 31
  • 1
  • 9
  • 2
    The question is whether 1) the child nodes of a `AVLNode` are supposed to be allowed to *not* be `AVLNode`s and 2) whether runtime polymorphism is required at all. Should a user be able to use a `AVLNode` through a `BSTNode` pointer/reference? If you just want compile-time polymorphism and/or to avoid code duplication there are other approaches. – walnut Jan 06 '20 at 22:26
  • The crux of @walnut 's point is formalized as [The Liskov Substitution Principle](https://stackoverflow.com/questions/56860/what-is-an-example-of-the-liskov-substitution-principle). Violating it can lead to very unfortunate behaviour. – user4581301 Jan 06 '20 at 22:33

1 Answers1

1

Problems like this arise when you are using the base class just to share code among different implementations, instead of to constraint method arguments and return values.

BSTNode probably doesn't add enough value in this role to justify the complications it produces, and you should probably just get rid of it.

But occasionally things like that are useful to provide skeleton base implementations for self-referential classes that are hard to implement. In C++, the type-correct way to do it is with the oddly-named curiously recurring template pattern: https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern

Using this pattern, the base class is defined as a template that takes the derived class as a parameter:

template <class NODE> class BSTNode {
    int value;
    NODE *left;
    NODE *right;
    ...
}

class AvlNode : public BSTNode<AvlNode> {
   ...
}

Now all the pointers in AvlNode have the correct type, and no casting is required.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87