Questions tagged [lsm-tree]

In computer science, the Log-Structured Merge-Tree (or LSM-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data.

In computer science, the Log-Structured Merge-Tree (or LSM-tree) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data.

LSM-trees maintain data in two separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

The LSM-tree is a hybrid data structure. It is composed of two tree-like structures, known as the C0 and C1 components. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk.

The performance characteristics of LSM-trees stem for the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort.

LSM-trees are used in database managements systems such as LevelDB, SQLite4 and Apache Cassandra.

Source:http://en.wikipedia.org/wiki/Log-structured_merge-tree

17 questions
31
votes
4 answers

What is the differences between the term SSTable and LSM Tree

Are these two terms used interchangeably? I have read about how SSTable works, and usually, articles just start mentioning LSM Tree. However, they seem to be the same thing. When should I use one term over the other?
Kakayou
  • 598
  • 1
  • 8
  • 16
3
votes
1 answer

How to maintain the sparse index in a LSM-tree?

In Designing Data Intensive Applications, Martin introduces a data structure called LSM-trees. There are mainly 3 parts: an in-memory memtable (usually a red-black tree), an in-memory sparse index, and on-disk SSTables (aka segments). They work…
yiksanchan
  • 1,890
  • 1
  • 13
  • 37
3
votes
1 answer

Why Bloom filters cannot handle range queries?

Context: I'm reading about RocksDB and LSM trees, from my understanding Bloom filter is used to avoid multiple I/Os for item retrieval in all the storage levels. And I'm ok with that. Apparently, one of the challenges is that Bloom filter cannot be…
Murray
  • 97
  • 8
3
votes
2 answers

LSM Tree lookup time

What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)? Is it O(log N)? O(N*Log N)? Something else? How about for a multiple query, like searching for multiple WHERE…
andandandand
  • 21,946
  • 60
  • 170
  • 271
2
votes
1 answer

Why rocksDB needs multiple levels?

All of the keys in rocksDB's level 1 are already sorted. Therefore we can get key quickly in this level. Why does rocksDB still need to compact the files in level 1 to level 2? I find an explanation on LevelDB's doc: Open file in one directory is…
YjyJeff
  • 833
  • 1
  • 6
  • 14
2
votes
1 answer

MongoDB: How can I change engine type (from B-Tree to LSM-Tree) of _id_ index?

We can create a collection with WiredTiger engine and type=lsm, but this feature is not mentioned in MongoDB documents: db.createCollection( "test", { storageEngine: { wiredTiger: {configString: "type=lsm"}}} ) Once insert some documents…
auntyellow
  • 2,423
  • 2
  • 20
  • 47
2
votes
1 answer

how do range queries work on a LSM (log structure merge tree)?

Recently I've been studying common indexing structures in databases, such as B+-trees and LSM. I have a solid handle on how point reads/writes/deletes/compaction would work in an LSM. For example (in RocksDB/levelDB), on a point query read we…
shinjin
  • 354
  • 4
  • 17
1
vote
0 answers

How does MongoDB WiredTiger store files

MongoDB WiredTiger offers LSMT for storage. Great, so in memory it maintains a balanced search tree which is flushed to disk depending on configuration (time or size). But q is, how is the data stored on disk? LSMT in Cassandra/HBase are stored as…
Apurva Singh
  • 4,534
  • 4
  • 33
  • 42
1
vote
2 answers

How exactly does the memtable flush to an SSTable on disk in LSM-trees?

Implementation wise, how exactly does the memtable (in Cassandra, RocksDB, LevelDB, or any LSM-tree) flush to an SSTable? I get that a memtable is some sorted data structured, like a red-black tree, but how do we turn that into a file of sorted…
squidword
  • 437
  • 3
  • 11
1
vote
1 answer

How HBase performs updates at disk transfer rate instead of disk seek rate?

I am reading the book HBase: The Definitive Guide and it is mentioned that while traditional relational databases perform updates/deletes at seek rate (B-trees), HBase performs updates/deletes at transfer rate (LSM trees). I am aware how LSM trees…
davuinci
  • 60
  • 10
1
vote
1 answer

How to skip some keys by using iterator?

As an example, I have add several keys to DB just like , <1 + 2> <1 + 3> <2 + 1> <2 + 4> <3 + 2> First, Seek() to <1, 2> and then Next() to <1, 3> After that, I want to skip the key <2, 1> and <2, 4> (whose prefix are all 2) and move the iterator…
stmatengss
  • 1,098
  • 7
  • 8
1
vote
1 answer

Does BigTable use Tiered or Leveled LSM-tree compaction?

Google BigTable is a system that uses LSM-tree as its core data structure for storage. LSM-tree can use different merge strategies. The two most common ones are (1) leveled merging which is more read-optimized, and (2) tiered merging which is more…
nday
  • 229
  • 1
  • 8
1
vote
1 answer

Why does LevelDB make its lower level 10 times bigger than upper one?

According to the official document, there is no doubt that lower level is 10 times bigger than upper one in LevelDB. The question is why 10? not 2? not 20? It is due to some rigorous math calculations or it just works? I have read the original LSMT…
LeeLee
  • 158
  • 1
  • 6
1
vote
0 answers

Why is LSM tree called an index structure?

Why are LSM trees called index structure? My understanding is that they are a "storage technique" i.e., how you store the data underneath. In fact, for performance reasons you would need an index on top of this data --in this case of LSM an…
1
vote
0 answers

compaction not triggered in rocksdb

I have trouble triggering compactions during a bulk insert with HASHSKIPLIST memtable in rocksdb. I use PlainTable SST file format. Memtable size is set to 64MB and number of write buffers is 6. While inserting 200Million data, the number of level0…
1
2