16

So I have a problem which i'm pretty sure is solvable, but after many, many hours of thinking and discussion, only partial progress has been made.

The issue is as follows. I'm building a BTree of, potentially, a few million keys. When searching the BTree, it is paged on demand from disk into memory, and each page in operation is relatively expensive. This effectively means that we want to need to traverse as few nodes as possible (although after a node has been traversed to, the cost of traversing through that node, up to that node is 0). As a result, we don't want to waste space by having lots of nodes nearing minimum capacity. In theory, this should be preventable (within reason) as the structure of the tree is dependent on the order that the keys were inserted in.

So, the question is how to reorder the keys such that after the BTree is built the fewest number of nodes are used. Here's an example:

Example

I did stumble on this question In what order should you insert a set of known keys into a B-Tree to get minimal height? which unfortunately asks a slightly different question. The answers, also don't seem to solve my problem. It is also worth adding that we want the mathematical guarantees that come from not building the tree manually, and only using the insert option. We don't want to build a tree manually, make a mistake, and then find it is unsearchable!

I've also stumbled upon 2 research papers which are so close to solving my question but aren't quite there! Time-and Space-Optimality in B-Trees and Optimal 2,3-Trees (where I took the above image from in fact) discuss and quantify the differences between space optimal and space pessimal BTrees, but don't go as far as to describe how to design an insert order as far as I can see.

Any help on this would be greatly, greatly appreciated.

Thanks

Research papers can be found at:

http://www.uqac.ca/rebaine/8INF805/Automne2007/Sujets2007Automne/p174-rosenberg.pdf

http://scholarship.claremont.edu/cgi/viewcontent.cgi?article=1143&context=hmc_fac_pub

EDIT:: I ended up filling a btree skeleton constructed as described in the above papers with the FILLORDER algorithm. As previously mentioned, I was hoping to avoid this, however I ended up implementing it before the 2 excellent answers were posted!

Community
  • 1
  • 1
Squimmy
  • 445
  • 1
  • 3
  • 11
  • You can pretty easily batch-build a b-tree by sorting the items and constructing the tree with that sorted list. You don't even need to insert them one by one. – usr Jul 28 '14 at 14:34
  • I'm afraid I don't quite follow. Inserting the items in sorted order definitely doesn't result in a space optimal BTree as far as I can see? – Squimmy Jul 28 '14 at 15:50
  • The algorithm doesn't work by inserting the items. You sort the list, then build the leaf nodes from that list. Now you have perfect leaf nodes. Next you build the additional layers on top of those leaf nodes.; What RDBMSs do is they have a special rule for inserts at the end. They don't page split evenly but create a fresh empty page. Don't take the b-tree algorithms (from wikipedia) too literally. You can play with them. – usr Jul 28 '14 at 15:57
  • Oh, I see. I'm very reluctant to build a tree from the leaves up though as then the possibility of some peculiar one-in-a-million bug being introduced that we didn't foresee, and end up creating some invalid unsearchable B-Tree which would be super, super problematic. I'm not actually worried about inserts afterwords actually. After it's built, we don't need to do any inserts on it really. – Squimmy Jul 28 '14 at 16:10
  • What about using a b-tree library? Insert in sequential order. I'd say that a good library optimizes for that cases because it is so common. – usr Jul 28 '14 at 16:18
  • If optimal space is your primary driver, are you sure a b - tree is the most appropriate data structure? A simple sorted array (vector) gives better cache locality and retains the ability to search with log (n) complexity. – Chad Jul 30 '14 at 14:56
  • Is this somehow different from a standard self-balancing binary tree such as a red-black tree? You are just looking for a guarantee that the height is log2(n) right? – Moby Disk Jul 30 '14 at 15:14
  • red-black trees don't have a log2(n) guarantee. The guarantee is only O(log2(n)) and in fact, can be as large as 2log2(n). And I think that is the point of the question, to some extent. Can you insert in a particular order that produces better height characteristics than the algorithm guarantees on its own. See my next comment (too long)! – Jeremy West Jul 30 '14 at 23:39
  • 1
    This will be hard to answer without knowing exactly how the insert operation works. I'm assuming you have some black box implementation that allows you to insert an item and then places it in the tree somehow? Then the answer will depend on when the algorithm decides to split blocks and how it splits them when it does. – Jeremy West Jul 30 '14 at 23:41

5 Answers5

4

The algorithm below should work for B-Trees with minimum number of keys in node = d and maximum = 2*d I suppose it can be generalized for 2*d + 1 max keys if way of selecting median is known.

Algorithm below is designed to minimize the number of nodes not just height of the tree.

Method is based on idea of putting keys into any non-full leaf or if all leaves are full to put key under lowest non full node.

More precisely, tree generated by proposed algorithm meets following requirements: It has minimum possible height; It has no more then two nonfull nodes on each level. (It's always two most right nodes.)

Since we know that number of nodes on any level excepts root is strictly equal to sum of node number and total keys number on level above we can prove that there is no valid rearrangement of nodes between levels which decrease total number of nodes. For example increasing number of keys inserted above any certain level will lead to increase of nodes on that level and consequently increasing of total number of nodes. While any attempt to decrease number of keys above the certain level will lead to decrease of nodes count on that level and fail to fit all keys on that level without increasing tree height. It also obvious that arrangement of keys on any certain level is one of optimal ones. Using reasoning above also more formal proof through math induction may be constructed.

The idea is to hold list of counters (size of list no bigger than height of the tree) to track how much keys added on each level. Once I have d keys added to some level it means node filled in half created in that level and if there is enough keys to fill another half of this node we should skip this keys and add root for higher level. Through this way, root will be placed exactly between first half of previous subtree and first half of next subtree, it will cause split, when root will take it's place and two halfs of subtrees will become separated. Place for skipped keys will be safe while we go through bigger keys and can be filled later.

Here is nearly working (pseudo)code, array needs to be sorted:

PushArray(BTree bTree, int d, key[] Array)
{
    List<int> counters = new List<int>{0};
    //skip list will contain numbers of nodes to skip 
    //after filling node of some order in half
    List<int> skip  = new List<int>();
    List<Pair<int,int>> skipList = List<Pair<int,int>>();

    int i = -1;
    while(true)
    {     
       int order = 0;
       while(counters[order] == d) order += 1;
       for(int j = order - 1; j >= 0; j--) counters[j] = 0;

       if (counters.Lenght <= order + 1) counters.Add(0);
       counters[order] += 1;

       if (skip.Count <= order)
              skip.Add(i + 2);    
       if (order > 0)
           skipList.Add({i,order}); //list of skipped parts that will be needed later
       i += skip[order];

       if (i > N) break;

       bTree.Push(Array[i]);
    }

    //now we need to add all skipped keys in correct order
    foreach(Pair<int,int> p in skipList)
    {
        for(int i = p.2; i > 0; i--)
            PushArray(bTree, d, Array.SubArray(p.1 + skip[i - 1], skip[i] -1))
    }
}

Example:

Here is how numbers and corresponding counters keys should be arranged for d = 2 while first pass through array. I marked keys which pushed into the B-Tree during first pass (before loop with recursion) with 'o' and skipped with 'x'.

                                                              24
        4         9             14             19                            29 
0 1 2 3   5 6 7 8   10 11 12 13    15 16 17 18    20 21 22 23    25 26 27 28    30 ...
o o x x o o o x x o  o  o  x  x  x  x  x  x  x  x  x  x  x  x  o  o  o  x  x  o  o ...
1 2     0 1 2     0  1  2                                      0  1  2        0  1 ...
0 0     1 1 1     2  2  2                                      0  0  0        1  1 ...
0 0     0 0 0     0  0  0                                      1  1  1        1  1 ...
skip[0] = 1 
skip[1] = 3 
skip[2] = 13

Since we don't iterate through skipped keys we have O(n) time complexity without adding to B-Tree itself and for sorted array;

In this form it may be unclear how it works when there is not enough keys to fill second half of node after skipped block but we can also avoid skipping of all skip[order] keys if total length of array lesser than ~ i + 2 * skip[order] and skip for skip[order - 1] keys instead, such string after changing counters but before changing variable i might be added:

while(order > 0 && i + 2*skip[order] > N) --order;

it will be correct cause if total count of keys on current level is lesser or equal than 3*d they still are split correctly if add them in original order. Such will lead to slightly different rearrangement of keys between two last nodes on some levels, but will not break any described requirements, and may be it will make behavior more easy to understand.

May be it's reasonable to find some animation and watch how it works, here is the sequence which should be generated on 0..29 range: 0 1 4 5 6 9 10 11 24 25 26 29 /end of first pass/ 2 3 7 8 14 15 16 19 20 21 12 13 17 18 22 23 27 28 enter image description here

Lurr
  • 861
  • 4
  • 11
2

The algorithm below attempts to prepare the order the keys so that you don't need to have power or even knowledge about the insertion procedure. The only assumption is that overfilled tree nodes are either split at the middle or at the position of the last inserted element, otherwise the B-tree can be treated as a black box.

The trick is to trigger node splits in a controlled way. First you fill a node exactly, the left half with keys that belong together and the right half with another range of keys that belong together. Finally you insert a key that falls in between those two ranges but which belongs with neither; the two subranges are split into separate nodes and the last inserted key ends up in the parent node. After splitting off in this fashion you can fill the remainder of both child nodes to make the tree as compact as possible. This also works for parent nodes with more than two child nodes, just repeat the trick with one of the children until the desired number of child nodes is created. Below, I use what is conceptually the rightmost childnode as the "splitting ground" (steps 5 and 6.1).

Apply the splitting trick recursively, and all elements should end up in their ideal place (which depends on the number of elements). I believe the algorithm below guarantees that the height of the tree is always minimal and that all nodes except for the root are as full as possible. However, as you can probably imagine it is hard to be completely sure without actually implementing and testing it thoroughly. I have tried this on paper and I do feel confident that this algorithm, or something extremely similar, should do the job.

Implied tree T with maximum branching factor M.

Top procedure with keys of length N:

  1. Sort the keys.
  2. Set minimal-tree-height to ceil(log(N+1)/log(M)).
  3. Call insert-chunk with chunk = keys and H = minimal-tree-height.

Procedure insert-chunk with chunk of length L, subtree height H:

  1. If H is equal to 1:
    1. Insert all keys from the chunk into T
    2. Return immediately.
  2. Set the ideal subchunk size S to pow(M, H - 1).
  3. Set the number of subtrees T to ceil((L + 1) / S).
  4. Set the actual subchunk size S' to ceil((L + 1) / T).
  5. Recursively call insert-chunk with chunk' = the last floor((S - 1) / 2) keys of chunk and H' = H - 1.
  6. For each of the ceil(L / S') subchunks (of size S') except for the last with index I:
    1. Recursively call insert-chunk with chunk' = the first ceil((S - 1) / 2) keys of subchunk I and H' = H - 1.
    2. Insert the last key of subchunk I into T (this insertion purposefully triggers a split).
    3. Recursively call insert-chunk with chunk' = the remaining keys of subchunk I (if any) and H' = H - 1.
  7. Recursively call insert-chunk with chunk' = the remaining keys of the last subchunk and H' = H - 1.

Note that the recursive procedure is called twice for each subtree; that is fine, because the first call always creates a perfectly filled half subtree.

Julian
  • 4,176
  • 19
  • 40
0

Here is a way which would lead to minimum height in any BST (including b tree) :-

  1. sort array
  2. Say you can have m key in b tree
  3. Divide array recursively in m+1 equal parts using m keys in parent.
  4. construct the child tree of n/(m+1) sorted keys using recursion.

example : -

m = 2 array = [1 2 3 4 5 6 7 8 9 10]

divide array into three parts :-

root = [4,8]

recursively solve :-

child1 = [1 2 3]

root1 = [2]

left1 = [1]

right1 = [3]

similarly for all childs solve recursively.
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
0

So is this about optimising the creation procedure, or optimising the tree?

You can clearly create a maximally efficient B-Tree by first creating a full Balanced Binary Tree, and then contracting nodes.

At any level in a binary tree, the gap in numbers between two nodes contains all the numbers between those two values by the definition of a binary tree, and this is more or less the definition of a B-Tree. You simply start contracting the binary tree divisions into B-Tree nodes. Since the binary tree is balanced by construction, the gaps between nodes on the same level always contain the same number of nodes (assuming the tree is filled). Thus the BTree so constructed is guaranteed balanced.

In practice this is probably quite a slow way to create a BTree, but it certainly meets your criteria for constructing the optimal B-Tree, and the literature on creating balanced binary trees is comprehensive.

=====================================

In your case, where you might take an off the shelf "better" over a constructed optimal version, have you considered simply changing the number of children nodes can have? Your diagram looks like a classic 2-3 tree, but its perfectly possible to have a 3-4 tree, or a 3-5 tree, which means that every node will have at least three children.

phil_20686
  • 4,000
  • 21
  • 38
  • I have no idea if a 3-3 tree is possible? Anyone know? – phil_20686 Aug 06 '14 at 10:22
  • You can't support neither 3-3 no 3-4 no 3-5, using classic B-tree algorithm, only 3-6 or 3-7. I'm not sure if 3-3 tree can't be invented at all but obviously you can't keep n keys in a tree if every node contains 3 keys and n is not divisible by 3. So you will need some additional buffer to keep n%3 keys at least. – Lurr Aug 06 '14 at 12:22
  • This isnt a hard constraint as you can have some "null equivalent" leaves node. e.g. add some long.minvalue to the list, since they are always the lowest nodes in the tree you can easily keep track of them and delete them if you need to add nodes. 3-6 and 3-7 are more convenient though as you avoid that complication. The problem with 3-3 is that rebalancing must happen whenever a node is removed. – phil_20686 Aug 06 '14 at 13:34
-1

Your question is about btree optimization. It is unlikely that you do this just for fun. So I can only assume that you would like to optimize data accesses - maybe as part of database programming or something like this. You wrote: "When searching the BTree, it is paged on demand from disk into memory", which means that you either have not enough memory to do any sort of caching or you have a policy to utilize as less memory as possible. In either way this may be the root cause for why any answer to your question will not be satisfying. Let me explain why.

When it comes to data access optimization, memory is your friend. It does not matter if you do read or write optimization you need memory. Any sort of write optimization always works on the assumption that it can read information in a quick way (from memory) - sorting needs data. If you do not have enough memory for read optimization you will not have that for write optimization too.

As soon as you are willing to accept at least some memory utilization you can rethink your statement "When searching the BTree, it is paged on demand from disk into memory", which makes up room for balancing between read and write optimization. A to maximum optimized BTREE is maximized write optimization. In most data access scenarios I know you get a write at any 10-100 reads. That means that a maximized write optimization is likely to give a poor performance in terms of data access optimization. That is why databases accept restructuring cycles, key space waste, unbalanced btrees and things like that...

Quicker
  • 1,247
  • 8
  • 16