1

When allow duplicates, BST nodes typically has the following property: left <= parent < right, or: left < parent <= right.

What is wrong with left <= parent <= right?

Helin Wang
  • 4,002
  • 1
  • 30
  • 34
  • Related: [Are duplicate keys allowed in the definition of binary search trees?](https://stackoverflow.com/questions/300935/are-duplicate-keys-allowed-in-the-definition-of-binary-search-trees) – Bernhard Barker Aug 10 '18 at 11:27

3 Answers3

4

Your premise is incorrect. In a BST that allows duplicates, it's always left <= parent <= right. The code that picks the place to insert a new node will just pick one side or the other, but that is not a rule about how nodes must be linked and it is not an invariant that will be maintained.

That's because, for trees with duplicate values, the condition that the left or right branches contain only strictly larger elements is not compatible with balancing operations. Try this: link 20 copies of the same value into a tree. If you can only link equal values on the left or the right, then you have to make a singly-linked list. Your tree will be 20 levels deep and completely unbalanced.

The way to think about duplicate values in the tree is that there really aren't any duplicate values in the tree :-) A BST defines a total ordering, and valid rebalancing operations like rotations preserve this total ordering.

When you insert a duplicate into the tree, you'll put it either to the left or right of the existing match. If you put it on the left, then it will be smaller according to the ordering of the tree, and it will always be smaller after any rebalancing. If you put it on the right, then it will be larger, and will stay larger according to the tree's ordering.

If you think about it, it has to be this way because the balancing operations and in-order traversals don't even look at the values in the nodes. The way they're linked together determines the ordering, the ordering doesn't change as the tree is rebalanced, and the ordering is total -- every node is either before or after every other node in an inorder traversal.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Thanks for the answer. This was not taught to us in the DS course. I love SO! – saketk21 Aug 10 '18 at 16:22
  • I see what you're saying, but the definition of "binary search tree" does not guarantee that rebalancing is a thing, or that any particular kind of rebalancing operations will necessarily be considered "valid". So your answer is very practical -- it applies to most real-world BSTs (which tend to be red-black trees or AVL trees or whatnot) -- but is not quite as categorical as it claims to be. – ruakh Aug 10 '18 at 17:49
  • 1
    @ruakh, 'always' here means 'in every realistic case'. I think that's reasonable since this is a site about programming (a practical art) and not math. – Matt Timmermans Aug 10 '18 at 17:55
  • @MattTimmermans: But your answer isn't correct for *every* realistic case, only for *most* real cases. (Non-rebalanced BSTs do exist in the wild, and work well enough for many purposes.) "Always" is a wild exaggeration. – ruakh Aug 10 '18 at 18:00
2

Because you need to maintain the O(log n) complexity for search. Consider you are searching for a Node, then you will have to check it in both the Left and Right subtree to check for its existence. However, the correct condition enforces the constraint that the Node will exist in only one of the subtrees.

Consider a scenario where the BST Node contains an Integer and a String, and the key for building the BST is the Integer.

If you need all the Strings for an integer a, you will need to check for both the subtrees, which will lead to a worse time-complexity of O(n) rather than the O(log n) if you implement it according to the correct condition.

saketk21
  • 435
  • 4
  • 13
  • Please comment if any further clarification is required. – saketk21 Aug 10 '18 at 06:52
  • 1
    Note: this doesn't **guarantee** O(log n) complexity for search - if all nodes (or some constant fraction of them) are equal, you'll have O(n) complexity either way. It also makes the tree less balanced, since equal nodes would essentially be a linked-list, which would also affect the complexity of searching for other nodes. – Bernhard Barker Aug 10 '18 at 09:18
  • The complexity of allowing equal nodes on both sides is probably closer to O(min(n, k log n)) (or something like that), where k is the number of equal nodes. – Bernhard Barker Aug 10 '18 at 09:24
  • In a BST with duplicates, finding either the first or last value for any key takes time proportional to the height -- there is no checking of both sides. Walking through all **m** values for a key in a tree of height **h** takes **O(h+m)**. These are true regardless of how you link together equal nodes, as long as the BST property is maintained. I don't think there is any part of this answer that is correct – Matt Timmermans Aug 10 '18 at 15:42
  • @Dukeling I agree that this leads to an inefficient tree structure and does not guarantee O(log n) search complexity. However, what I tried to share was the intuition behind the condition to the best of my knowledge. Both your and Matt's comments were insightful to broaden my spectrum. Thanks. – saketk21 Aug 10 '18 at 16:20
0

If left <= parent <= right then in case of equality where would you go? Left or right? You need to be deterministic, not choose randomly. So let's say you decide to always use left, then there you go: left <= parent < right

memo
  • 1,868
  • 11
  • 19
  • It looks like this as elements are initially inserted, but when the tree is rebalanced (and it certainly will be if you insert enough of the same element), you will end up with equal nodes linked on both sides of parents. – Matt Timmermans Aug 10 '18 at 15:29