In Designing Data Intensive Applications, Martin introduces a data structure called LSM-trees.
There are mainly 3 parts: an in-memory memtable (usually a red-black tree), an in-memory sparse index, and on-disk SSTables (aka segments). They work together like this:
When a write happens, it first goes to the memtable, and when it turns full, all the data are flushed into a new segment (with all the keys sorted).
When a read happens, it first looks up the memtable. If the key doesn't exist there, it looks up the sparse index, to learn which segment the key may reside. See figure 1.
Periodically, compaction happens that merges multiple segments into one. See figure 2.
As you can tell from figure 2, keys are sorted within a segment, however keys are NOT sorted between segments. This make me wonder: how do we maintain the sparse index s.t. keys in the index have increasing offset?