I'm implementing a bytecode VM and am struggling referencing data stored in a parsed representation of the bytecode. As is the nature of (most) bytecode, it and thus its parsed representation remain unmodified once it's initialized. A separate Vm
contains the mutable parts (stack etc.) along with that module. I made an MCVE with additional explanatory comments to illustrate the problem; it's at the bottom and on the playground. The parsed bytecode may look like this:
Module { struct_types: {"Bar": StructType::Named(["a", "b"])} }
The strings "Bar"
, "a"
, "b"
are references into the bytecode and have lifetime 'b
, so we also have lifetimes in the types Module<'b>
and StructType<'b>
.
After creating this, I will want to create struct instances, think let bar = Bar { a: (), b: () };
. At least currently, each struct instance needs to hold a reference to its type, so that type might look like this:
pub struct Struct<'b> {
struct_type: &'b bytecode::StructType<'b>,
fields: Vec<Value<'b>>,
}
The values of a struct's fields may be constants whose value is stored in the bytecode, so the Value
enum has a lifetime 'b
as well, and that works. The problem is that I have a &'b bytecode::StructType<'b>
in the first field: how do I get a reference that lives long enough? I think the reference would actually be valid long enough.
The part of the code that I suspect to be the critical one is here:
pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
// self.struct_types.get(name)
todo!("fix lifetime problems")
}
With the commented out code, I can't get a 'b
reference because the reference self.struct_types
lives too short; to fix that I'd need to do &'b self
which would spread virally through the code; also, most of the times I need to borrow the Vm
mutably, which doesn't work if all those exclusive self references have to live long.
Introducing a separate lifetime 'm
so that I could return a &'m StructType<'b>
sounds like something that I could try as well, but that sounds just as viral and in addition introduces a separate lifetime I need to keep track of; being able to replace 'b
with 'm
(or at least only having on in each place) would be a bit nicer.
Finally this feels like something that pinning could be helpful with, but I don't understand that topic enough to make an educated guess on how that could be approached.
MCVE
#![allow(dead_code)]
mod bytecode {
use std::collections::BTreeMap;
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StructType<'b> {
/// unit struct type; doesn't have fields
Empty,
/// tuple struct type; fields are positional
Positional(usize),
/// "normal" struct type; fields are named
Named(Vec<&'b str>),
}
impl<'b> StructType<'b> {
pub fn field_count(&self) -> usize {
match self {
Self::Empty => 0,
Self::Positional(field_count) => *field_count,
Self::Named(fields) => fields.len(),
}
}
}
#[derive(Debug, Clone)]
pub struct Module<'b> {
struct_types: BTreeMap<&'b str, StructType<'b>>,
}
impl<'b> Module<'b> {
// here is the problem: I would like to return a reference with lifetime 'b.
// from the point I start executing instructions, I know that I won't modify
// the module (particularly, I won't add entries to the map), so I think that
// lifetime should be possible - pinning? `&'b self` everywhere? idk
pub fn struct_type(&self, _name: &str) -> Option<&'b StructType<'b>> {
// self.struct_types.get(name)
todo!("fix lifetime problems")
}
}
pub fn parse<'b>(bytecode: &'b str) -> Module<'b> {
// this would use nom to parse actual bytecode
assert_eq!(bytecode, "struct Bar { a, b }");
let bar = &bytecode[7..10];
let a = &bytecode[13..14];
let b = &bytecode[16..17];
let fields = vec![a, b];
let bar_struct = StructType::Named(fields);
let struct_types = BTreeMap::from_iter([
(bar, bar_struct),
]);
Module { struct_types }
}
}
mod vm {
use crate::bytecode::{self, StructType};
#[derive(Debug, Clone)]
pub enum Value<'b> {
Unit,
Struct(Struct<'b>),
}
#[derive(Debug, Clone)]
pub struct Struct<'b> {
struct_type: &'b bytecode::StructType<'b>,
fields: Vec<Value<'b>>,
}
impl<'b> Struct<'b> {
pub fn new(struct_type: &'b bytecode::StructType<'b>, fields: Vec<Value<'b>>) -> Self {
Struct { struct_type, fields }
}
}
#[derive(Debug, Clone)]
pub struct Vm<'b> {
module: bytecode::Module<'b>,
}
impl<'b> Vm<'b> {
pub fn new(module: bytecode::Module<'b>) -> Self {
Self { module }
}
pub fn create_struct(&mut self, type_name: &'b str) -> Value<'b> {
let struct_type: &'b StructType<'b> = self.module.struct_type(type_name).unwrap();
// just initialize the fields to something, we don't care
let fields = vec![Value::Unit; struct_type.field_count()];
let value = Value::Struct(Struct::new(struct_type, fields));
value
}
}
}
pub fn main() {
// the bytecode contains all constants needed at runtime;
// we're just interested in how struct types are handled
// obviously the real bytecode is not as human-readable
let bytecode = "struct Bar { a, b }";
// we parse that into a module that, among other things,
// has a map of all struct types
let module = bytecode::parse(bytecode);
println!("{:?}", module);
// we create a Vm that is capable of running commands
// that are stored in the module
let mut vm = vm::Vm::new(module);
// now we try to execute an instruction to create a struct value
// the instruction for this contains a reference to the type name
// stored in the bytecode.
// the struct value contains a reference to its type and holds its field values.
let value = {
let bar = &bytecode[7..10];
vm.create_struct(bar)
};
println!("{:?}", value);
}