3

What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)?

Is it O(log N)? O(N*Log N)? Something else?

How about for a multiple query, like searching for multiple WHERE clauses in a key-value database?

The wikipedia page on LSM trees is currently lacking this info.

And I'm trying to make sense of the original paper.

andandandand
  • 21,946
  • 60
  • 170
  • 271
  • I think it is hard to estimate the search complexity of a complex data structure like LSM tree, because it has different components and complexity depends on how you manage different operations within this components. For example, as LSM has multiple layers and size of each layers vary. Now depending on whether you like to check for the existence of the key in each layer or using another data structure (like bloom filter) for the membership testing may makes the overall complexity different. – biqarboy Feb 15 '21 at 23:38
  • LevelDB (a LSM based lightweight key-value store) have O(lg N) search performance with a very large branching factor. But, I believe they achieved it through different optimizations. – biqarboy Feb 15 '21 at 23:42

2 Answers2

1

I have been wondering the same.

If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).

Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).

However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).

AndyF
  • 183
  • 1
  • 9
-1

For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.

Warren Dew
  • 8,790
  • 3
  • 30
  • 44