how to rebalance a random binary search tree

Question

Here is the situation: There's a balanced binary search tree which may be access by tens of threads. So when I need to insert or delete a node, I don't want to lock the whole tree due to the concurrency. as time goes it becomes not balanced again. When the tree is not so busy being used, I finally get chance to lock and rebalance it. How can I do this?

or is there a better data struct I can use?

How did you come across this situation? A better data structure can be decided only when we know the use cases. Is the tree only being read from or is it being updated too frequently? — displayName, Sep 19 '17 at 03:48
When you delete a node, just mark it as deleted, but keep it in the tree. When you have time to lock and rebalance, do an in-order traversal of the old tree, and build a new tree from all the nodes that aren't marked as deleted. — Jim Mischel, Sep 19 '17 at 14:03
@displayName most time of the day, the tree is being read in high concurrency and meanwhile it is being updated frequently. but every couple hours, the read concurrency cools down, and it will be ok to lock. — Knova Chan, Sep 20 '17 at 11:37
I have a solution but it may not be a very good one - Use two copies of the tree. Allow reads from both. But when updating, lock one tree and update it while letting the reads go to another. Then update the second one and let reads go to the first one. — displayName, Sep 20 '17 at 13:48

score 3 · Accepted Answer · answered Sep 20 '17 at 20:34

You can actually rebalance it using the Day-Stout-Warren algorithm. It's linear in the number of nodes, so might take a while. Moreover, this approach raises a question: what if during the interval when you don't rebalance the tree that's being read it quickly becomes severely unbalanced, and all consequent reads are done in, say, O(N) instead of O(logN)? Is it OK to have this loss of performance for hours in order to not lock things? Are you sure there will be a performance win?

If you can tolerate lack of linearizability (i.e. you write a value but when you search for it immediately after it's not found; it will be there eventually, but 100ms-10s might pass), you can implement a "copy on write" tree: all writes are done by one thread (with rebalancing), and you periodically clone the tree into a read-only copy that can be used by reading threads without any concurrency control, you just need to publish it atomically. Can be done especially fast if the tree is implemented on top of a continuous memory chunk that can be copied as a whole and freed/garbage-collected as a whole.

Another option is to use a concurrent skip list: it gives logarithmic average case search/delete/insert time and is more easily parallelizable. There is a standard lock-free implementation for Java if you happen to use it. You can find more information about concurrent skip lists and balanced search trees here. Particularly, you can find there mentions of a chromatic tree, a binary search tree that is optimized for concurrent rebalancing.

I did some test ,and yes the performance is not so good if I don't rebalance the tree. It's ok if the data is not found several minutes after insertion, so 'copy on write' is a every good solution that I didn't think of. thank u every much! — Knova Chan, Sep 22 '17 at 01:17

how to rebalance a random binary search tree

1 Answers1