I am implementing a simple B-tree in Rust (as hobby project).
The basic implementation works well. As long as everything fits in memory, nodes can store direct references to their children and it is easy to traverse and use them.
However, to be able to store more data inside than fits in RAM, it should be possible to store tree nodes on disk and retrieve parts of the tree only when they are needed.
But now it becomes incredibly hard to return a reference to a part of the tree.
For instance, when implementing Node::lookup_value
:
extern crate arrayvec;
use arrayvec::ArrayVec;
use std::collections::HashMap;
type NodeId = usize;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub enum NodeContents<Value> {
Leaf(ArrayVec<Value, 15>),
Internal(ArrayVec<NodeId, 16>)
}
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Node<Key, Value> {
pub keys: ArrayVec<Key, 15>,
pub contents: NodeContents<Value>,
}
pub trait Pool<T> {
type PoolId;
/// Invariants:
/// - Ony call `lookup` with a key which you *know* exists. This should always be possible, as stuff is never 'simply' removed.
/// - Do not remove or alter the things backing the pool in the meantime (Adding more is obviously allowed!).
fn lookup(&self, key: &Self::PoolId) -> T;
}
#[derive(PartialEq, Eq, PartialOrd, Ord, Clone, Debug)]
pub struct EphemeralPoolId(usize);
// An example of a pool that keeps all nodes in memory
// Similar (more complicated) implementations can be made for pools storing nodes on disk, remotely, etc.
pub struct EphemeralPool<T> {
next_id: usize,
contents: HashMap<usize, T>,
}
impl<T: Clone> Pool<T> for EphemeralPool<T> {
type PoolId = EphemeralPoolId;
fn lookup(&self, key: &Self::PoolId) -> T {
let val = self.contents.get(&key.0).expect("Tried to look up a value in the pool which does not exist. This breaks the invariant that you should only use it with a key which you received earlier from writing to it.").clone();
val
}
}
impl<Key: Ord, Value> Node<Key, Value> {
pub fn lookup_value<'pool>(&self, key: &Key, pool: &'pool impl Pool<Node<Key, Value>, PoolId=NodeId>) -> Option<&Value> {
match &self.contents {
NodeContents::Leaf(values) => { // Simple base case. No problems here.
self.keys.binary_search(key).map(|index| &values[index]).ok()
},
NodeContents::Internal(children) => {
let index = self.keys.binary_search(key).map(|index| index + 1).unwrap_or_else(|index| index);
let child_id = &children[index];
// Here the borrow checker gets mad:
let child = pool.lookup(child_id); // NodePool::lookup(&self, NodeId) -> Node<K, V>
child.lookup_value(key, pool)
}
}
}
}
The borrow checker gets mad. I understand why, but not why to solve it.
It gets mad, because NodePool::lookup
returns a node by value. (And it has to do so, because it reads it from disk. In the future there may be a cache in the middle, but elements from this cache might be removed at any time, so returning references to elements in this cache is also not possible.)
However, lookup_value
returns a reference to a tiny part of this value. But the value stops being around once the function returns.
In the Borrow Checker's parlance:
child.lookup_value(key, pool)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function
How to solve this problem?
The simplest solution would be to change the function in general to always return by-value (Option<V>
). However, these values are often large and I would like to refrain from needless copies. It is quite likely that lookup_value
is called in quick succession with closely related keys. (And also, there is a more complex scenario which the borrow checker similarly complains about, where we look up a range of sequential values with an interator. The same situation applies there.)
Are there ways in which I can make sure that child
lives long enough?
I have tried to pass in an extra &mut HashMap<NodeId, Arc<Node<K, V>>>
argument to store all children needed in the current traversal to keep their references alive for long enough. (See this variant of the code on the Rust Playground here)
However, then you end up with a recursive function that takes (and modifies) a hashmap as well as an element in the hashmap. Rust (rightfully) blocks this from working: It's (a) not possible to convince Rust that you won't overwrite earlier values in the hashmap, and (b) it might need to be reallocated when it grows which would also invalidate the references.
So that does not work.
Are there other solutions? (Even ones possibly involving unsafe
?)