48

I have been implementing an LLRB package that should be able to operate in either of the two modes, Bottom-Up 2-3 or Top-Down 2-3-4 described by Sedgewick (code - improved code, though dealing only with 2-3 trees here, thanks to RS for pointer).

Sedgewick provides a very clear description of tree operations for the 2-3 mode, although he spends a lot of time talking about the 2-3-4 mode. He also shows how a simple alteration of the order of color flipping during insertion can alter the behaviour of the tree (either split on the way down for 2-3-4 or split on the way up for 2-3):

private Node insert(Node h, Key key, Value value)
{
    if (h == null)
        return new Node(key, value);

    // Include this for 2-3-4 trees
    if (isRed(h.left) && isRed(h.right)) colorFlip(h);

    int cmp = key.compareTo(h.key);

    if (cmp == 0)     h.val = value;
    else if (cmp < 0) h.left = insert(h.left, key, value);
    else              h.right = insert(h.right, key, value);

    if (isRed(h.right) && !isRed(h.left))    h = rotateLeft(h);
    if (isRed(h.left) && isRed(h.left.left)) h = rotateRight(h);

    // Include this for 2-3 trees
    if (isRed(h.left) && isRed(h.right)) colorFlip(h);

    return h;
}

However, he glosses over deletion in 2-3-4 LLRBs with the following:

The code on the next page is a full implementation of delete() for LLRB 2-3 trees. It is based on the reverse of the approach used for insert in top-down 2-3-4 trees: we perform rotations and color flips on the way down the search path to ensure that the search does not end on a 2-node, so that we can just delete the node at the bottom. We use the method fixUp() to share the code for the color flip and rotations following the recursive calls in the insert() code. With fixUp(), we can leave right-leaning red links and unbalanced 4-nodes along the search path, secure that these conditions will be fixed on the way up the tree. (The approach is also effective 2-3-4 trees, but requires an extra rotation when the right node off the search path is a 4-node.)

His delete() function:

private Node delete(Node h, Key key)
{
    if (key.compareTo(h.key) < 0)
        {
            if (!isRed(h.left) && !isRed(h.left.left))
            h = moveRedLeft(h);
            h.left = delete(h.left, key);
        }
    else
        {
            if (isRed(h.left))
                h = rotateRight(h);
            if (key.compareTo(h.key) == 0 && (h.right == null))
                return null;
            if (!isRed(h.right) && !isRed(h.right.left))
                h = moveRedRight(h);
            if (key.compareTo(h.key) == 0)
                {
                    h.val = get(h.right, min(h.right).key);
                    h.key = min(h.right).key;
                    h.right = deleteMin(h.right);
                }
            else h.right = delete(h.right, key);
        }
    return fixUp(h);
}

My implementation correctly maintains LLRB 2-3 invariants for all tree operations on 2-3 trees, but fails for a subclass of right-sided deletions on 2-3-4 trees (these failing deletions result in right leaning red nodes, but snowball to tree imbalance and finally null pointer dereferencing). From a survey of example code that discusses LLRB trees and includes options for construction of trees in either mode, it seems that none correctly implements the deletion from a 2-3-4 LLRB (i.e. none has the extra rotation alluded to, e.g. Sedgewick's java above and here).

I'm having trouble figuring out exactly what he means by "extra rotation when the right node off the search path is a 4-node"; presumably this is a rotate left, but where and when?

If I rotate left passing upwards past a 4-node equivalent (i.e. RR node) or a right leaning 3-node equivalent (BR node) either before calling fixUp() or at the end of the fixUp function I still get the same invariant contradiction.

Here are the tree states of the smallest failing examples I have found (generated by sequential insertion of elements from 0 to the respective maximum value).

The first pair of trees shows the transition from invariant-conforming state prior to deletion of element 15 to obviously broken state after.

A 2-3-4 LLRB containing 0..15

State after deleting item 15

The second is essentially the same as above, but with deletion of 16 of 0..16 (deletion of 15 results in the same topology). Note that the invariant contradiction manages to cross the root node.

A 2-3-4 LLRB containing 0..16

State after deleting item 16

The key is going to be understanding how to revert the violations generated during the walk down the tree to the target node. The following two trees show how the first tree above looks after a walk down the left and right respectively (without deletion and before walking back up with fixUp()).

After attempt to delete '-1' without fixUp: After attempt to delete '-1' without fixUp

After attempt to delete '16' without fixUp: After attempt to delete '16' without fixUp

Trying rotate left on the walk back up when the node has only a red right child seems to be part of the solution, but it does not deal correctly with two red right children in a row, preceding this with a flipColor when both children are red seems to improve the situation further, but still leaves some invariants violated.

If I further check whether the right child of a right child is red when its sibling is black and rotate left if this is true I only fail once, but at this point I feel like I'm needing a new theory rather than a new epicycle.

Any ideas?

For reference, my implementation is available here (No, it's not Java).

Follow-up:

Part of the reason I was interested in this was to confirm the claim by many that 2-3 LLRB trees are more efficient than 2-3-4 LLRB trees. My benchmarking has confirmed this for insertion and deletion (2-3 are about 9% faster), but I find that retrieval is very slightly faster for 2-3-4 trees.

The following times are representative and consistent across runs:

BU23:
BenchmarkInsert        1000000        1546 ns/op
BenchmarkDelete        1000000        1974 ns/op
BenchmarkGet           5000000         770 ns/op

TD234:
BenchmarkInsert        1000000        1689 ns/op
BenchmarkDelete        1000000        2133 ns/op
BenchmarkGet           5000000         753 ns/op

First column is bench name, second is number of operations, third is result. Benchmark on i5M 2.27.

I have had a look at branch lengths for 2-3 tree and 2-3-4 trees and there is little in that to explain the retrieval difference (mean distance from root to node and S.D. of 1000 trees each with 10000 random inserts):

Means:
TD234 leafs  BU23 leafs 
 12.88940     12.84681 
TD234 all    BU23 all 
 11.79274     11.79163 

StdDev:
TD234 leafs  BU23 leafs 
 1.222458     1.257344 
TD234 all    BU23 all 
 1.874335     1.885204 
kortschak
  • 755
  • 5
  • 21
  • 16
    "This question shows research effort." - hmm... check... – Mysticial Jul 08 '12 at 01:56
  • 1
    I've found multiple bugs in various RBT implementations that can be found online (e.g. in "Sorting and Searching Algorithms" by Thomas Niemann, which is an intro text + C code). Sadly, I don't remember all the details, except the bad reference pseudo-code offered in the famous book "Introduction To Algorithms" by Cormen/Leiserson/Rivest/Stein (also used in the Niemann's code). See [this answer](http://stackoverflow.com/a/11328289/968261) for details. – Alexey Frunze Jul 08 '12 at 03:59
  • I agree, there is a lot of bad/sloppy code describing this. – kortschak Jul 08 '12 at 04:04
  • It seems that my old RBT implementation does have an extra chunk in the deletion path. I even have a comment there that goes something like: "fix up for the case when the child of the deleted node is NIL (this was missing in several other RB tree docs & source codes and the net effect of this bug was that the red-black height wasn't the same for all terminal nodes in the tree, IOW the logarithmic height feature was broken". Can't say much more than that, need to refresh my RBT knowledge and study the code. – Alexey Frunze Jul 08 '12 at 04:04
  • Sedgewick's deleteMin implementation is wrong. Try inserting 3 nodes, so that you get a tree of height 1. After you do that, try calling deleteMin twice, after that you will get an empty tree, but what about the third element? – Pavel Aug 04 '15 at 08:37
  • paulpaul1076, I can't reproduce that with my code which is a pretty faithful representation of Sedgewick's java. – kortschak Aug 05 '15 at 09:56

1 Answers1

10

Updated and verified

Of key importance to testing this is that the implementation doesn't support deleting a nonexistent or previously deleted node! I spent way too long trying to figure out why my working solution was "broken". This can be fixed by doing a preliminary search for the key and returning false if it's not in the tree at all, and that solution was employed in the linked code at the bottom.

It doesn't appear Sedgewick wrote a deletion for 2-3-4 deletion that is publicly available. His results specifically deal with 2-3 trees (he only makes cursory mention of 2-3-4 trees in that their average path length (and thus search cost), as well as that of other red-black trees, is indistinguishable from the 2-3 case). Nobody else seems to have one easily found, either, so here's what I found after debugging the problem:

To begin, take Sedgewick's code and fix the out of date bits. In the slides here (pg 31) you can see that his code still uses the old representation of 4 nodes where it was done by having two left reds in a row, rather than balance. The first bit to write a 2-3-4 deletion routine, then, is to fix this so that we can do a sanity check which will help us verify our fixes later:

private boolean is234(Node x)
{         
   if (x == null)
      return true;
   // Note the TD234 check is here because we also want this method to verify 2-3 trees
   if (isRed(x.right))
      return species == TD234 && isRed(x.left);

   if (!isRed(x.right))
      return true;

   return is234(x.left) && is234(x.right);
} 

Once we have this, we know a couple things. One, from the paper we see that 4 nodes should not be broken on the way up when using a 2-3-4 tree. Two, there's a special case for a right 4-node on the search path. There's a third special case that isn't mentioned, and that is when you are going back up a tree, you may end up where you have h.right.left be red, which would leave you invalid with just a rotate left. This is the mirror of the case described for insert on page 4 of the paper.

The rotation fix for a 4-node you need is as follows:

    private Node moveRedLeft(Node h)
    {  // Assuming that h is red and both h.left and h.left.left
       // are black, make h.left or one of its children red.
       colorFlip(h);
       if (isRed(h.right.left))
       { 
          h.right = rotateRight(h.right);

          h = rotateLeft(h);
          colorFlip(h);

          if (isRed(h.right.right) )
             h.right = rotateLeft(h.right);
       }
      return h;
    }

And this removes the splitting on 2-3-4, as well as adds the fix for the third special case

private Node fixUp(Node h)
{
   if (isRed(h.right))
   {      
      if (species == TD234 && isRed(h.right.left))
         h.right = rotateRight(h.right);
      h = rotateLeft(h);
   }

   if (isRed(h.left) && isRed(h.left.left))
      h = rotateRight(h);

   if (species == BU23 && isRed(h.left) && isRed(h.right))
      colorFlip(h);

   return setN(h);
}

Finally, we need to test this and make sure it works. They don't have to be the most efficient, but as I found during the debugging of this, they have to actually work with the expected tree behavior (i.e. not insert/delete duplicate data)! I did this with a test helper methods. The commented lines were there for when I was debugging, I'd break and check the tree for obvious imbalance. I've tried this method with 100000 nodes, and it performed flawlessly:

   public static boolean Test()
   {
      return Test(System.nanoTime());
   }
   public static boolean Test(long seed)
   {
      StdOut.println("Seeding test with: " + seed);
      Random r = new Random(seed);
      RedBlackBST<Integer, Integer> llrb = new RedBlackBST<Integer,Integer>(TD234);
      ArrayList<Integer> treeValues = new ArrayList<Integer>();
      for (int i = 0; i < 1000; i++)
      {
         int val = r.nextInt();
         if (!treeValues.contains(val))
         {
            treeValues.add(val);
            llrb.put(val, val);
         }
         else
            i--;
      }
      for (int i = 0; i < treeValues.size(); i++)
      {
         llrb.delete(treeValues.get(i));
         if (!llrb.check())
         {
            return false;
         }
//         StdDraw.clear(Color.GRAY);
//         llrb.draw(.95, .0025, .008);
      }
      return true;
   }

The complete source can be found here.

Tawnos
  • 1,877
  • 1
  • 17
  • 26
  • I'm afraid that doesn't work as a fix and actually breaks the 2-3 mode (I suspect you have the mode tests inverted, but changing that does not fix the problem). – kortschak Jul 09 '12 at 10:17
  • Edited, yes, I'd flipped the modes. What case are you seeing not working, as the image was me stepping down through the recursions and back up (albeit through `deleteMax()` since that is what `delete(15)` is equivalent to. – Tawnos Jul 09 '12 at 15:11
  • Setting up a JDK now, so I should be able to run a quick test suite and report back. – Tawnos Jul 09 '12 at 15:29
  • Funny thing, the author contradicts himself in the is234 check. His code claims that a 234 tree allows two red links in a row along the leftward tree, rather than be balanced. I fixed that by changing the check to `return x == null || (((isRed(x.left) && isRed(x.right)) || !isRed(x.right)) && is234(x.left) && is234(x.right));` Writing a test generation suite now. – Tawnos Jul 09 '12 at 16:14
  • Yes, the check code in tne original jave is incorrect. The newly linked code is better, but only deals with 2-3 trees. I'm fairly sure my tests are correct, so you could ork from them with an eye on the papers and his tree figures. i'll re-run tests when I wake up and let you know what fails. – kortschak Jul 09 '12 at 21:15
  • OK. The tests that fail are TestDelete{MinMax,Right}, Test{,Random{,Insertion}}Deletion, all with nil pointer dereference panics due to broken assumptions about children at the first right rotation. – kortschak Jul 09 '12 at 23:34
  • I'm still looking at this. It appears that Sedgewick may have only tried it for the trivial case, and fell into the same trap I did. Hopefully a bit more playing and I'll have a working delete. – Tawnos Jul 10 '12 at 20:50
  • Oh geez, I had solved this yesterday, and just realized why I was still getting a null pointer exception: this implementation doesn't handle deleting the same value twice. Updating answer. – Tawnos Jul 10 '12 at 22:54
  • I'm a bit troubled that your is234 cannot return false. I have an almost complete solution that depends on treating the left and right branches separately. Once tested, I'll post. – kortschak Jul 11 '12 at 03:20
  • Yeah, it would help if I had an up to date copy paste :) I wrote a lot of that last night when I got stuck on the NullPointer error (and didn't realize that until after already pasting here - previously I had the version in my comments, but I was simplifying). So that's fixed here as well now, code still works as expected. – Tawnos Jul 11 '12 at 03:28
  • Sorry, not quite. Failing case added to your answer. – kortschak Jul 11 '12 at 04:14
  • Rejected because it was a comment (also, I couldn't actually see the links until re-revisiting. Made no sense when I first read it. Also, I would need to know _how_ you generated that tree, so I could reproduce the issue. Just adding 1->10,000 then deleting 1->10,000 doesn't show it, so I'm trying to figure out what you were testing. – Tawnos Jul 11 '12 at 05:19
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13707/discussion-between-kortschak-and-tawnos) – kortschak Jul 11 '12 at 05:30