The solution could be to use atomic operations. No locking, no context switching, no sleeping, and much much faster than mutexes or condition variables. Atomic ops are not the end all solution to everything, but we have created a lot of thread safe versions of common data structures using just atomic ops. They are very fast.
Atomic ops are a series of simple operations like increment, or decrement or assignment that are guaranteed to execute atomically in a multi threaded environment. If two threads hit the op at the same time, the cpu makes sure one thread executes the op at a time. Atomic ops are hardware instructions, so they are fast. "Compare and swap" is very useful for thread safe data structures. in our testing atomic compare and swap is about as fast as 32 bit integer assignment. Maybe 2x as slow. When you consider how much cpu is consumed with mutexes, atomic ops are infinitely faster.
Its not trivial to do rotations to balance your tree with atomic operations, but not impossible. I ran into this requirement in the past and cheated by making a thread safe skiplist since a skiplist can be done real easy with atomic operations. Sorry I can't give you a copy of our code...my company would fire me, but its easy enough to do yourself.
How atomic ops work to make lock free data structures can be visualized by the simple threadsafe linked list example. To add an item to a global linked list (_pHead) without using locks. First save a copy of _pHead, pOld. I think of these copies as "the state of the world" when executing concurrent ops. Next create a new node, pNew, and set its pNext to the COPY. Then use atomic "compare and swap" to change _pHead to pNew ONLY IF pHead IS STILL pOld. The atomic op will succeed only if _pHead hasn't changed. If it fails, loop back to get a copy of the new _pHead and repeat.
If the op succeeds, the rest of the world will now see a new head. If a thread got the old head a nanosecond before, that thread wont see the new item, but the list will still be safe to iterate through. Since we preset the pNext to the old head BEFORE we added our new item to the list, if a thread sees the new head a nanosecond after we added it, the list is safe to traverse.
Global stuff:
typedef struct _TList {
int data;
struct _TList *pNext;
} TList;
TList *_pHead;
Add to list:
TList *pOld, *pNew;
...
// allocate/fill/whatever to make pNew
...
while (1) { // concurrency loop
pOld = _pHead; // copy the state of the world. We operate on the copy
pNew->pNext = pOld; // chain the new node to the current head of recycled items
if (CAS(&_pHead, pOld, pNew)) // switch head of recycled items to new node
break; // success
}
CAS is shorthand for __sync_bool_compare_and_swap or the like. See how easy? No Mutexes...no locks! In the rare event that 2 threads hit that code at the same time, one simply loops a second time. We only see the second loop because the scheduler swaps a thread out while in the concurrency loop. so it is rare and inconsequential.
Things can be pulled off the head of a linked list in a similar way. You can atomically change more than one value if you use unions and you can use uup to 128 bit atomic ops. We have tested 128 bit on 32 bit redhat linux and they are ~same speed as the 32, 64 bit atomic ops.
You will have to figure out how to use this type of technique with your tree. A b tree node will have two ptrs to child nodes. You can CAS them to change them. The balancing problem is tough. I can see how you could analyze a tree branch before you add something and make a copy of the branch from a certain point. when you finish changing the branch, you CAS the new one in. This would be a problem for large branches. Maybe balancing can be done "later" when the threads are not fighting over the tree. Maybe you can make it so the tree is still searchable even though you haven't cascaded the rotation all the way...in other words if thread A added a node and is recursively rotating nodes, thread b can still read or add nodes. Just some ideas. In some cases, we make a structure that has version numbers or lock flags in the 32 bits after the 32 bits of pNext. We use 64 bit CAS then. Maybe you could make the tree safe to read at all times without locks, but you might have to use the versioning technique on a branch that is being modified.
Here are a bunch of posts I have made talking about the advantages of atomic ops:
Pthreads and mutexes; locking part of an array
Efficient and fast way for thread argument
Configuration auto reloading with pthreads
Advantages of using condition variables over mutex
single bit manipulation
Is memory allocation in linux non-blocking?