1

Assume that we are implementing a B+ tree in memory, keys are at the internal nodes and key-data pairs are in the leaf nodes. If B+tree with a fan-out f, this means that B+ tree will have a height of log_f N where N is the number of keys, whereas the corresponding BST will have height of log_2 N. If we are not doing any disk reads and writes, can B+tree search performance be better than Binary Search Tree search performance? How? Since for B+tree at each internal node we have make a decision on F many choices instead if 1 for BST?

burcak
  • 1,009
  • 10
  • 34
  • 2
    Pretty much no. The whole appeal of B+ trees is to reduce disk seeks since disk access is sooooo slow. Only way I can ever see it outperforming a naive BST is due to its cache-friendliness, but it's unlikely, and in that case the BST could probably use further optimization with a better allocation strategy. –  Dec 11 '17 at 23:52
  • If B+ tree is totally in memory implemented then I could not see a reason for it to perform better than BST. But why do think that B+tree has cache-friendliness and BST does not? – burcak Dec 12 '17 at 06:28
  • 1
    Because it can put it's internal F keys in vector or something contagious, that depending on your implementation of, might not be the case for BST – Shihab Shahriar Khan Dec 12 '17 at 17:15

1 Answers1

0

At least when compared to cache, main memory has many of the same characteristics as a disk drive--it has fairly high bandwidth, but much higher latency than cache. It has a fairly large minimum read size, and gives substantially higher bandwidth when reads are predictable (e.g., when you read a number a number of cache lines at contiguous addresses). As such, it benefits from the same general kinds of optimizations (though the details often vary a bit).

B-trees (and variants like B* and B+ trees) were explicitly designed to work well with the access patterns supported well by disk drives. Since you have to read a fairly substantial amount of data anyway, you might as well pack the data to maximize the amount you accomplish from the memory you have to read. In both cases, you also frequently get a substantial bandwidth gain by reading some multiple of the minimum read in a predictable pattern (especially, a number of successive reads at successive addresses). As such, it often makes sense to increase the size of a single page to something even larger than the minimum you can read at once.

Likewise, in both cases we can plan on descending through a number of layers of nodes in the tree before we find the data we really care about. Much like when reading from disk, we benefit from maximizing the density of keys in the data we read, until we've actually found the data we care about. With a typical binary tree:

template <class T, class U>
struct node {
    T key;
    U data;
    node *left;
    node *right;
};

...we end up reading a number of data items for which we have no real use. It's only when we've found the right key that we need/want to get the associated data. In fairness, we can do that with a binary tree as well, with only a fairly minor modification to the node structure:

template <class T, class U>
struct node {
    T key;
    U    *data;
    node *left;
    node *right;
};

Now the node contains only a pointer to the data rather than the data itself. This won't accomplish anything if data is small, but can accomplish a great deal if it's large.

Summary: from the viewpoint of the CPU, reads from main memory have the same basic characteristics as reads from disk; a disk just shows a more extreme version of those same characteristics. As such, most of the design considerations that led to the design of B-trees (and variants) now apply similarly to data stored in main memory.

B-trees work well and often provide substantial benefits when used for in-memory storage.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Thanks for the reply. But in B+ tree except the root the internal nodes can have more than two children and assume that each node already keeps it data. So we do not need a pointer to a data. Data is also in memory. In that case I wonder whether B+tree can be still better than Binary search tree? – burcak Dec 21 '17 at 05:17