1

I am implementing a simple B-tree in Rust (as hobby project).

The basic implementation works well. As long as everything fits in memory, nodes can store direct references to their children and it is easy to traverse and use them.

However, to be able to store more data inside than fits in RAM, it should be possible to store tree nodes on disk and retrieve parts of the tree only when they are needed.

But now it becomes incredibly hard to return a reference to a part of the tree. For instance, when implementing Node::lookup_value:

extern crate arrayvec;
use arrayvec::ArrayVec;

use std::collections::HashMap;

type NodeId = usize;


#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
  Leaf(ArrayVec<Value, 15>),
  Internal(ArrayVec<NodeId, 16>)
}

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
  pub keys: ArrayVec<Key, 15>,
  pub contents: NodeContents<Value>,
}

pub trait Pool<T> {
    type PoolId;

    /// Invariants:
    /// - Ony call `lookup` with a key which you *know* exists. This should always be possible, as stuff is never 'simply' removed.
    /// - Do not remove or alter the things backing the pool in the meantime (Adding more is obviously allowed!).
    fn lookup(&self, key: &Self::PoolId) -> T;
}

#[derive(PartialEq, Eq, PartialOrd, Ord, Clone, Debug)]
pub struct EphemeralPoolId(usize);

// An example of a pool that keeps all nodes in memory
// Similar (more complicated) implementations can be made for pools storing nodes on disk, remotely, etc.
pub struct EphemeralPool<T> {
    next_id: usize,
    contents: HashMap<usize, T>,
}

impl<T: Clone> Pool<T> for EphemeralPool<T> {
    type PoolId = EphemeralPoolId;

    fn lookup(&self, key: &Self::PoolId) -> T {
        let val = self.contents.get(&key.0).expect("Tried to look up a value in the pool which does not exist. This breaks the invariant that you should only use it with a key which you received earlier from writing to it.").clone();
        val
    }
}

impl<Key: Ord, Value> Node<Key, Value> {
  pub fn lookup_value<'pool>(&self, key: &Key, pool: &'pool impl Pool<Node<Key, Value>, PoolId=NodeId>) -> Option<&Value> {
    match &self.contents {
      NodeContents::Leaf(values) => { // Simple base case. No problems here.
        self.keys.binary_search(key).map(|index| &values[index]).ok()
      },
      NodeContents::Internal(children) => {
        let index = self.keys.binary_search(key).map(|index| index + 1).unwrap_or_else(|index| index);
        let child_id = &children[index];

        // Here the borrow checker gets mad:
        let child = pool.lookup(child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
        child.lookup_value(key, pool)
      }
    }
  }
}

Rust Playground

The borrow checker gets mad. I understand why, but not why to solve it.

It gets mad, because NodePool::lookup returns a node by value. (And it has to do so, because it reads it from disk. In the future there may be a cache in the middle, but elements from this cache might be removed at any time, so returning references to elements in this cache is also not possible.)

However, lookup_value returns a reference to a tiny part of this value. But the value stops being around once the function returns.

In the Borrow Checker's parlance:

child.lookup_value(key, pool) 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

How to solve this problem? The simplest solution would be to change the function in general to always return by-value (Option<V>). However, these values are often large and I would like to refrain from needless copies. It is quite likely that lookup_value is called in quick succession with closely related keys. (And also, there is a more complex scenario which the borrow checker similarly complains about, where we look up a range of sequential values with an interator. The same situation applies there.)

Are there ways in which I can make sure that child lives long enough? I have tried to pass in an extra &mut HashMap<NodeId, Arc<Node<K, V>>> argument to store all children needed in the current traversal to keep their references alive for long enough. (See this variant of the code on the Rust Playground here)

However, then you end up with a recursive function that takes (and modifies) a hashmap as well as an element in the hashmap. Rust (rightfully) blocks this from working: It's (a) not possible to convince Rust that you won't overwrite earlier values in the hashmap, and (b) it might need to be reallocated when it grows which would also invalidate the references.

So that does not work.


Are there other solutions? (Even ones possibly involving unsafe?)

Qqwy
  • 5,214
  • 5
  • 42
  • 83
  • Does this answer your question? [How to allocate structs on the heap without taking up space on the stack in stable Rust?](https://stackoverflow.com/questions/59232877/how-to-allocate-structs-on-the-heap-without-taking-up-space-on-the-stack-in-stab) – mousetail Jul 21 '22 at 09:50
  • @mousetail No, as far as I can tell that question seems to have very little in common with this question. This question is not related to heap-allocation at all. – Qqwy Jul 21 '22 at 09:53
  • Hmm instead of actual references I'd store my own IDs that are stored in some centralized hashmap – mousetail Jul 21 '22 at 10:01
  • I don't understand, your `lookup_value` doesn't return anything, what's going on? Did you forget a return value annotation? – Finomnis Jul 21 '22 at 10:01
  • Also, a [minimal reproducable example](https://stackoverflow.com/help/minimal-reproducible-example) would be really helpful, with all external references mocked that exactly produces the error message you claim it gives. For example, the `Key` and `Value` generics in `NodeContents` don't do anything, and `V` in it is undefined. This is frustrating for people that try to help you, please respect the rules and provide such a minimal example. – Finomnis Jul 21 '22 at 10:06
  • @Finomnis You are absolutely correct. I minimized my actual code beyond the point of it being runnable which is definitely frustrating. Apologies. The question has been updated with a proper runnable example. – Qqwy Jul 21 '22 at 10:35
  • Saw that too late, created a minimal example myself in my answer. If it doesn't fit your problem, let me know, and I'll take a look at it again. – Finomnis Jul 21 '22 at 10:41
  • Why does `Leaf` have a length of `15`, but `Internal` have a length of `16`? – Finomnis Jul 21 '22 at 10:43
  • @Finomnis Internal nodes always have one key less than they have branches. The branching factor of a B-tree is often expressed as `M`, with `2*M` branches and `2*M-1` keys to choose between them. For `M = 1` you end up with a [binary search tree](https://en.wikipedia.org/wiki/Binary_search_tree). For `M = 2` you have a [2,3-tree](https://en.wikipedia.org/wiki/2–3_tree), etc. The algorithm in the code above models a [B+-tree](https://en.wikipedia.org/wiki/B%2B_tree) where `M = 8`. Some B+-tree impls use the extra slot in the leaf to store a pointer to the next leaf for faster iteration. – Qqwy Jul 21 '22 at 11:05
  • For this kind of flexibility, I think the solution will involve `Rc` or `Arc`. This would allow you to eject values from the cache that are still in use elsewhere, and would fix this issue because you can clone an existing `Rc` in the first case or create a new one in the second. – cdhowie Jul 21 '22 at 13:23

2 Answers2

1

Minimal Reproducible Example

Your example contained a lot of errors and undefined types. Please provide a proper minimal reproducible example next time to avoid frustration and increase the chance of getting a quality answer.

EDIT: I saw your modified question too late with the proper minimal example, I hope this one fits. If not, I'll take another look. Let me know.

Either way, I attempted to reverse-engineer what your minimal example could have been:

use std::marker::PhantomData;

type NodeId = usize;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
    Leaf(Vec<Value>),
    Internal(Vec<NodeId>),
}

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
    pub keys: Vec<Key>,
    pub contents: NodeContents<Value>,
}

struct NodePool<Key, Value> {
    _p: PhantomData<(Key, Value)>,
}

impl<Key, Value> NodePool<Key, Value> {
    fn lookup(&self, _child_id: NodeId) -> Node<Key, Value> {
        todo!()
    }
}

impl<Key, Value> Node<Key, Value>
where
    Key: Ord,
{
    pub fn lookup_value(&self, key: &Key, pool: &NodePool<Key, Value>) -> Option<&Value> {
        match &self.contents {
            NodeContents::Leaf(values) => {
                // Simple base case. No problems here.
                self.keys
                    .binary_search(key)
                    .map(|index| &values[index])
                    .ok()
            }
            NodeContents::Internal(children) => {
                let index = self
                    .keys
                    .binary_search(key)
                    .map(|index| index + 1)
                    .unwrap_or_else(|index| index);
                let child_id = &children[index];

                // Here the borrow checker gets mad:
                let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
                child.lookup_value(key, pool)
            }
        }
    }
}
error[E0515]: cannot return reference to local variable `child`
  --> src/lib.rs:50:17
   |
50 |                 child.lookup_value(key, pool)
   |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

Thoughts and Potential Solutions

The main question you have to ask yourself: Who owns the data once it's loaded from disk?

This is not a simple question with a simple answer. It opens many new questions:

  • If you query the same key twice, should it load from disk twice? Or should it be cached and some other previously cached value should be moved to disk?
  • If it should be loaded from disk twice, should the entire Node that lookup returned be kept alive until the borrow is returned, or should only the Value be moved out and kept alive?

If your answer is "it should be loaded twice" and "only the Value should be kept alive", you could use an enum that can carry either a borrowed or owned Value:

enum ValueHolder<'a, Value> {
    Owned(Value),
    Borrowed(&'a Value),
}

impl<'a, Value> std::ops::Deref for ValueHolder<'a, Value> {
    type Target = Value;

    fn deref(&self) -> &Self::Target {
        match self {
            ValueHolder::Owned(val) => val,
            ValueHolder::Borrowed(val) => val,
        }
    }
}

In your code, it could look like this:

use std::marker::PhantomData;

type NodeId = usize;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
    Leaf(Vec<Value>),
    Internal(Vec<NodeId>),
}

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
    pub keys: Vec<Key>,
    pub contents: NodeContents<Value>,
}

pub struct NodePool<Key, Value> {
    _p: PhantomData<(Key, Value)>,
}

impl<Key, Value> NodePool<Key, Value> {
    fn lookup(&self, _child_id: NodeId) -> Node<Key, Value> {
        todo!()
    }
}

pub enum ValueHolder<'a, Value> {
    Owned(Value),
    Borrowed(&'a Value),
}

impl<'a, Value> std::ops::Deref for ValueHolder<'a, Value> {
    type Target = Value;

    fn deref(&self) -> &Self::Target {
        match self {
            ValueHolder::Owned(val) => val,
            ValueHolder::Borrowed(val) => val,
        }
    }
}

impl<Key, Value> Node<Key, Value>
where
    Key: Ord,
{
    pub fn lookup_value(
        &self,
        key: &Key,
        pool: &NodePool<Key, Value>,
    ) -> Option<ValueHolder<Value>> {
        match &self.contents {
            NodeContents::Leaf(values) => {
                // Simple base case. No problems here.
                self.keys
                    .binary_search(key)
                    .map(|index| ValueHolder::Borrowed(&values[index]))
                    .ok()
            }
            NodeContents::Internal(children) => {
                let index = self
                    .keys
                    .binary_search(key)
                    .map(|index| index + 1)
                    .unwrap_or_else(|index| index);
                let child_id = &children[index];

                // Here the borrow checker gets mad:
                let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
                child
                    .into_lookup_value(key, pool)
                    .map(|val| ValueHolder::Owned(val))
            }
        }
    }

    pub fn into_lookup_value(self, key: &Key, pool: &NodePool<Key, Value>) -> Option<Value> {
        match self.contents {
            NodeContents::Leaf(mut values) => {
                // Simple base case. No problems here.
                self.keys
                    .binary_search(key)
                    .map(|index| values.swap_remove(index))
                    .ok()
            }
            NodeContents::Internal(children) => {
                let index = self
                    .keys
                    .binary_search(key)
                    .map(|index| index + 1)
                    .unwrap_or_else(|index| index);
                let child_id = &children[index];

                // Here the borrow checker gets mad:
                let child = pool.lookup(*child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
                child.into_lookup_value(key, pool)
            }
        }
    }
}

In all other cases, however, it seems like further thought needs to be put into restructuring the project.

Finomnis
  • 18,094
  • 1
  • 20
  • 27
  • Thank you for answering! However, it does not fully fit my problem. Returning a Cow here means that for any tree which does not only cosist of a root node (i.e. any time the function recurses) the data has to be owned, which will be the case for any tree that is large enough to require storage on disk in the first place. – Qqwy Jul 21 '22 at 11:21
  • Your question 'Who owns the data once it is loaded from disk?' is a very good one! I want to keep the loaded data around for the duration of one or multiple calls to `lookup_value`. To model this I tried adding a mutable `HashMap` as extra parameter and passing that through the recursion (as mentioned above), which I also could not get the borrow checker happy with. See [this variant of the code](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e0b38bbac9142bef4be41a641d40eddd). – Qqwy Jul 21 '22 at 11:21
  • I think your question is very related to caching in general. Maybe you should look at how other, existing projects implement caching for key value pair databases, which stumble across the exact same problems. – Finomnis Jul 21 '22 at 11:23
  • @Qqwy The problem with your `memento` is that it returns `child`, which is borrowed from `memento`. But `lookup_value` requires `memento` to be modifiable, and there can never be a mutable and immutable reference simultaneously. This is not just an error in Rust, it's an error in every language, other languages just simply don't complain. (For example in C++: holding a reference to a Vector element, and then adding an element to the vector could cause a reallocation to happen and the reference to become dangling) – Finomnis Jul 21 '22 at 11:32
  • I think the main problem comes from that your current data structure layout just isn't feasible and needs a rework. – Finomnis Jul 21 '22 at 11:34
  • 1
    Note that `ValueHolder` might be redundant with the built-in type `Cow`. – cdhowie Jul 21 '22 at 13:12
0

After a lot of searching and trying, and looking at what other databases are doing (great suggestion, @Finomnis!) the solution presented itself.

Use an Arena

The intuition to use a HashMap was a good one, but you have the problems with lifetimes, for two reasons:

  • A HashMap might be reallocated when growing, which would invalidate old references.
  • You cannot convince the compiler that new insertions into the Hashmap will not overwrite existing values.

However, there is an alternative datastructure which is able to provide these guarantees: Arena allocators.

With these, you essentially 'tie' the lifetimes of the values you insert into them, to the lifetime of the allocator as a whole.

Internally (at least conceptually), an arena keeps track of a list of memory regions in which data might be stored, the most recent one of which is not entirely full. Individual memory regions are never reallocated to ensure that old references remain valid; instead, when the current memory region is full, a new (larger) one is allocated and added to the list, becoming the new 'current'.

Everything is destroyed at once at the point the arena itself is deallocated.

There are two prevalent Rust libraries that provide arenas. They are very similar, with slightly different trade-offs:

  • typed-arena allows only a single type T per arena, but runs all Drop implementations when the arena is dropped.
  • bumpalo allows different objects to be stored in the same arena, but their Drop implementations do not run.

In the common situation of having only a single datatype with no special drop implementation, you might want to benchmark which one is faster in your use-case.

In the original question, we are dealing with a cache, and it makes sense to reduce copying of large datastructures by passing around Arcs between the cache and the arena. This means that you can only use typed-arena in this case, as Arc has a special Drop implementation.

Qqwy
  • 5,214
  • 5
  • 42
  • 83