2

I am interested in knowing the idiomatic/canonical way of making self-referential structures in Rust. The related question Why can't I store a value and a reference to that value in the same struct explains the problem, but try as I might, I couldn't figure out the answer in the existing question (although there were some useful hints).

I have come up with a solution, but I am unsure of how safe it is, or if it is the idiomatic way to solve this problem; if it isn't, I would very much like to know what the usual solution is.

I have an existing structure in my program that holds a reference to a sequence. Sequences hold information about chromosomes so they can be rather long, and copying them isn't a viable idea.

// My real Foo is more complicated than this and is an existing
// type I'd rather not have to rewrite if I can avoid it...
struct Foo<'a> {
    x: &'a [usize],
    // more here...
}
impl<'a> Foo<'a> {
    pub fn new(x: &'a [usize]) -> Self {
        Foo {
            x, /* more here... */
        }
    }
}

I now need a new structure that reduces the sequence to something smaller and then builds a Foo structure over the reduced string, and since someone has to own both reduced string and Foo object, I would like to put both in a structure.

// My real Bar is slightly more complicated, but it boils down to having
// a vector it owns and a Foo over that vector.
struct Bar<'a> {
    x: Vec<usize>,
    y: Foo<'a>, // has a reference to &x
}


// This doesn't work because x is moved after y has borrowed it
impl<'a> Bar<'a> {
    pub fn new() -> Self {
        let x = vec![1, 2, 3];
        let y = Foo::new(&x);
        Bar { x, y }
    }
}

Now, this doesn't work because the Foo object in a Bar refers into the Bar object

Structure of a Bar

and if the Bar object moves, the reference will point into memory that is no longer occupied by the Bar object

Moving Bar

To avoid this problem, the x element in Bar must sit on the heap and not move around. (I think the data in a Vec already sits happily on the heap, but that doesn't seem to help me here).

A pinned box should do the trick, I belive.

struct Bar<'a> {
    x: Pin<Box<Vec<usize>>>,
    y: Foo<'a>, 
}

Now the structure looks like this

x on the heap

and when I move it, the references point to the same memory.

Move with x on the heap

However, moving x to the heap isn't enough for the type-checker. It still thinks that moving the pinned box will move what it points to.

If I implement Bar's constructor like this:

impl<'a> Bar<'a> {
    pub fn new() -> Self {
        let v: Vec<usize> = vec![1, 2, 3];
        let x = Box::pin(v);
        let y = Foo::new(&x);
        Bar { x, y }
    }
}

I get the error

error[E0515]: cannot return value referencing local variable `x`
  --> src/main.rs:22:9
   |
21 |         let y = Foo::new(&x);
   |                          -- `x` is borrowed here
22 |         Bar { x, y }
   |         ^^^^^^^^^^^^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `x` because it is borrowed
  --> src/main.rs:22:15
   |
17 | impl<'a> Bar<'a> {
   |      -- lifetime `'a` defined here
...
21 |         let y = Foo::new(&x);
   |                          -- borrow of `x` occurs here
22 |         Bar { x, y }
   |         ------^-----
   |         |     |
   |         |     move out of `x` occurs here
   |         returning this value requires that `x` is borrowed for `'a`

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.

(Playground)

Even though the object I take a reference of sits on the heap, and doesn't move, the checker still sees me borrowing from an object that moves, and that, of course, is a no-no.

Here, you might stop and notice that I am trying to make two pointers to the same object, so Rc or Arc is an obvious solution. And it is, but I would have to change the implementation of Foo to have an Rc member instead of a reference. While I do have control of the source code for Foo, and I could update it and all the code that uses it, I am reluctant to make such a major change if I can avoid it. And I could have been in a situation where I am not in control of the Foo, so I couldn't change its implementation, and I would love to know how I would solve that situation then.

The only solution I could get to work was to get a raw pointer to x, so the type-checker doesn't see that I borrow it, and then connect x and y though that.

impl<'a> Bar<'a> {
    pub fn new() -> Self {
        let v: Vec<usize> = vec![1, 2, 3];
        let x = Box::new(v);
        let (x, y) = unsafe {
            let ptr: *mut Vec<usize> = Box::into_raw(x);
            let w: &Vec<usize> = ptr.as_ref().unwrap();
            (Pin::new(Box::from_raw(ptr)), Foo::new(&w))
        };
        Bar { x, y }
    }
}

Playground code here

What I don't know is if this is the right way to do it. It seems rather complicated, but perhaps it is the only way to make a structure like this in Rust? That some sort of unsafe is needed to trick the compiler. So that is the first of my questions.

The second is, if this is safe to do? Of course it is unsafe in the technical sense, but am I risking creating a reference to memory that might not be valid later? It is my impression that Pin should guarantee that the object remains where it is supposed to sit, and that the lifetime of the Bar<'a> and Foo<'a> objects should ensure that the reference doesn't out-live the vector, but once I have gone unsafe, could that promise be broken?

Update

The owning_ref crate has functionality that looks like what I need. You can create owned objects that present their references as well.

There is an OwningRef type that wraps an object and a reference, and it would be wonderful if you could have the slice in that and getting the reference wasn't seen as borrowing from the object, but obviously that isn't the case. Code such as this

use owning_ref::OwningRef;
struct Bar<'a> {
    x: OwningRef<Vec<usize>, [usize]>,
    y: Foo<'a>, // has a reference to &x
}

// This doesn't work because x is moved after y has borrowed it
impl<'a> Bar<'a> {
    pub fn new() -> Self {
        let v: Vec<usize> = vec![1, 2, 3];
        let x = OwningRef::new(v);
        let y = Foo::new(x.as_ref());
        Bar { x, y }
    }
}

you get the error

error[E0515]: cannot return value referencing local variable `x`
  --> src/main.rs:22:9
   |
21 |         let y = Foo::new(x.as_ref());
   |                          ---------- `x` is borrowed here
22 |         Bar { x, y }
   |         ^^^^^^^^^^^^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `x` because it is borrowed
  --> src/main.rs:22:15
   |
17 | impl<'a> Bar<'a> {
   |      -- lifetime `'a` defined here
...
21 |         let y = Foo::new(x.as_ref());
   |                          ---------- borrow of `x` occurs here
22 |         Bar { x, y }
   |         ------^-----
   |         |     |
   |         |     move out of `x` occurs here
   |         returning this value requires that `x` is borrowed for `'a`

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.
error: could not compile `foo` due to 2 previous errors

The reason is the same as before: I borrow a reference to x and then I move it.

There are different wrapper objects in the crate, and in various combinations they will let me get close to a solution and then snatch it away from me, because what I borrow I still cannot move later, e.g.:

use owning_ref::{BoxRef, OwningRef};
struct Bar<'a> {
    x: OwningRef<Box<Vec<usize>>, Vec<usize>>,
    y: Foo<'a>, // has a reference to &x
}

// This doesn't work because x is moved after y has borrowed it
impl<'a> Bar<'a> {
    pub fn new() -> Self {
        let v: Vec<usize> = vec![1, 2, 3];
        let v = Box::new(v); // Vector on the heap
        let x = BoxRef::new(v);
        let y = Foo::new(x.as_ref());
        Bar { x, y }
    }
}
error[E0515]: cannot return value referencing local variable `x`
  --> src/main.rs:23:9
   |
22 |         let y = Foo::new(x.as_ref());
   |                          ---------- `x` is borrowed here
23 |         Bar { x, y }
   |         ^^^^^^^^^^^^ returns a value referencing data owned by the current function

error[E0505]: cannot move out of `x` because it is borrowed
  --> src/main.rs:23:15
   |
17 | impl<'a> Bar<'a> {
   |      -- lifetime `'a` defined here
...
22 |         let y = Foo::new(x.as_ref());
   |                          ---------- borrow of `x` occurs here
23 |         Bar { x, y }
   |         ------^-----
   |         |     |
   |         |     move out of `x` occurs here
   |         returning this value requires that `x` is borrowed for `'a`

Some errors have detailed explanations: E0505, E0515.
For more information about an error, try `rustc --explain E0505`.

I can get around this by going unsafe and work with a pointer, of course, but then I am back to the solution I had with Pin and pointer hacking. I strongly feel that there is a solution here, (especially because having a Box<Vec<...>> and the corresponding Vec<...> isn't adding much to the table so there must be more to the crate), but what it is is eluding me.

Thomas Mailund
  • 1,674
  • 10
  • 16
  • 1
    First, asking a new question is dangerous. I won't downvote, because I'm not sure this is strictly forbidden, but see for example https://meta.stackoverflow.com/q/253521/7884305, https://meta.stackoverflow.com/q/394271/7884305 or https://meta.stackoverflow.com/q/252252/7884305. It is much better to edit the original question and _explain_ why none of the solutions proposed there worked. For example, the accepted answer there suggests using `ourboros` or `owning_ref`. Have you tried doing that? – Chayim Friedman Apr 26 '22 at 06:06
  • `Pin` is indeed for unsafe code. I haven't reviewed your unsafe code for soundness, but this is many times the only way to self-referential structs indeed **if you really need them**. In this case, though, I expect no problems using one of the abovementioned crates (I didn't use them much, though). – Chayim Friedman Apr 26 '22 at 06:07
  • 1
    The previous question is closed so editing it won’t help, I think. The box at the top of the page suggests asking a new question with a link for that. It isn’t ideal, I agree, but got the idea that this would be the approach. I tried some crates including owning_ref (don’t remember if I also tried ourboros), but ran into similar problems. When I’ve borrowed the wrapped reference I can no longer move the wrapper. I’ll give it another go; I might be using it wrong. But my attempts so far would require that I rewrite the Foo code, which I’m reluctant to do. – Thomas Mailund Apr 26 '22 at 06:13
  • 2
    I know the bar suggests asking a new question, and that's why I'm hesitant about downvoting, but the general direction of the community, like noted in the answers I linked, is that people that had their questions closed as duplicates should edit it, because if not we're back at the same place. And a question can (and should, if properly edited) be reopened. You can also ping the original closers if nobody notices (it is not a common practice, but I think it's okay). – Chayim Friedman Apr 26 '22 at 06:19
  • 1
    Anyway, even if _you_ tried and failed, we need to see your tries and your failures, the errors you got, etc.. This is what people (I, at least) expected to see when they voted to close the answer and not reopened it yet. – Chayim Friedman Apr 26 '22 at 06:20
  • I’ll do more experimentation today then and save the attempts. I can update this question and delete the old or vice versa. Don’t know which is best – Thomas Mailund Apr 26 '22 at 06:22
  • A closed question can be reopened if you edit it to explain clearly the differences. – Jmb Apr 26 '22 at 06:23
  • @Jmb And part of being different is "I tried the suggested solution but failed; is it appropriate here and/or can you elaborate on how?" – Chayim Friedman Apr 26 '22 at 06:24
  • In your case, the right answer is probably to use [`Rc`](https://doc.rust-lang.org/1.54.0/std/rc/struct.Rc.html) or [`Arc`](https://doc.rust-lang.org/1.54.0/std/sync/struct.Arc.html) as shown in [this answer](https://stackoverflow.com/a/70776273/5397009). – Jmb Apr 26 '22 at 06:25
  • @ChayimFriedman which would be a good argument to reopen the original question rather than re-asking it. – Jmb Apr 26 '22 at 06:29
  • 3
    @ThomasMailund in the future, if you don't know whether to just edit a question, ask a new one or whatever, you can ask a question on the meta forum! They can also give you more detailed advices about what to do to reopen your question. – jthulhu Apr 26 '22 at 06:29
  • @Jmb Indeed I prefer that, I just said that because the duplicate bar also suggests asking a new one I didn't downvote. – Chayim Friedman Apr 26 '22 at 06:31
  • 1
    @BlackBeans And search for duplicates in the meta forum before asking... :) – Chayim Friedman Apr 26 '22 at 06:32
  • Thanks for all the great comments. I think what I will do is delete the old question and edit this one, and add other attempts. I don't have all of the lying around, but I will need to give them another good try anyway. I have some meetings now but will get to it right after. – Thomas Mailund Apr 26 '22 at 07:00
  • If you've found a solution, please write an answer and accept it. – Chayim Friedman Apr 26 '22 at 08:49
  • Will do. I'll move the last update to an answer. – Thomas Mailund Apr 26 '22 at 09:03
  • Ok, I have to wait two days before I can accept my own answer, but I'll get back to it at the end of the week, then. Cheers – Thomas Mailund Apr 26 '22 at 09:06

2 Answers2

3

(I think the data in a Vec already sits happily on the heap, but that doesn't seem to help me here).

Indeed the data in a Vec does already sit on the heap, and the x: &'a [usize] in Foo is already a reference to that heap allocation; so your problem here is not (as shown in your graphics) that moving Bar would result in (the undefined behaviour of) a dangling reference.

However, what happens if the Vec were to outgrow its current allocation? It would reallocate and be moved from its present heap allocation to another—and this would result in a dangling reference. Hence the borrow checker must enforce that, so long as anything (e.g. a Foo) that borrows from the Vec exists, the Vec cannot be mutated. Yet here we already have an expressivity problem: the Rust language has no way to annotate Bar to indicate this relationship.

Your proposed unsafe solution uses <*mut _>::as_ref, whose safety documentation includes the following requirement (emphasis added):

  • You must enforce Rust’s aliasing rules, since the returned lifetime 'a is arbitrarily chosen and does not necessarily reflect the actual lifetime of the data. In particular, for the duration of this lifetime, the memory the pointer points to must not get mutated (except inside UnsafeCell).

This is the key bit of the compiler's safety checks that you are trying to opt out of—but because accessing Bar now requires that one uphold this requirement, you do not have a completely safe abstraction. In my view, a raw pointer would be a tad safer here because it forces one to consider the safety of every access.

For example, one issue that immediately springs to mind is that x is declared before y in Bar and therefore, upon destruction, it will be dropped first: the Vec's heap allocation will be freed while Foo still holds references into it: undefined behaviour! Simply reordering the fields would avoid this particular problem, but there would have been no such problem with raw pointers (and any attempt to dereference them in Foo's drop handler would have forced one to consider whether they were still dereferenceable at that time).

Personally, I would try to avoid self-referencing here and probably use an arena.

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • The case of moving memory when reallocating a vector wouldn't happen as long as I have an immutable reference to the vector, I believe. That doesn't help, of course, because a borrow is a borrow and the checker doesn't know when it is safe and when it isn't to have a reference into the object after it is moved. But as long as I have a reference to the vector, I think it should be safe to point at its memory. Right? – Thomas Mailund Apr 26 '22 at 09:41
  • I'm not sure I understand what you mean by using an arena, sorry. – Thomas Mailund Apr 26 '22 at 09:42
1

I think I have finally grokked ouroboros and that is an elegant solution.

You use a macro, self_referencing when defining a structure, and inside the structure you can specify that one entry borrows others. For my application, I got it to work like this:

use ouroboros::self_referencing;

#[self_referencing]
struct _Bar {
    x: Vec<usize>,
    #[borrows(x)]
    #[covariant]
    y: Foo<'this>,
}
struct Bar(pub _Bar);

The y element references x, so I specify that. I'm sure why co-/contra-varianse is needed in this particular case where there is only one lifetime, but it specififes whether other references should live longer or can live shorter than the object. I've defined the struct as _Bar and then wrapped it in Bar. This is because macro will create a new method, and I don't want the default one. At the same time I wnat to call my constructor new to stick with tradition. So I wrap the type and write my own constructor:

impl Bar {
    pub fn new() -> Self {
        let x: Vec<usize> = vec![1, 2, 3];
        let _bar = _BarBuilder {
            x,
            y_builder: |x: &Vec<usize>| Foo::new(&x),
        }
        .build();
        Bar(_bar)
    }
}

I don't use the generated _Bar::new but a generated _BarBuilder object where I can specify how to get the y value from the x reference.

I have also written accessors to get the two values. There isn't anything special here.

impl Bar {
    pub fn x(&self) -> &Vec<usize> {
        self.0.borrow_x()
    }
    pub fn y(&self) -> &Foo {
        self.0.borrow_y()
    }
}

and with that my trivial little test case runs...

fn main() {
    let bar = Bar::new();
    let vec = bar.x();
    for &i in vec {
        println!("i == {}", i);
    }

    let vec = bar.y().x;
    for &i in vec {
        println!("i == {}", i);
    }
}

This is probably the best solution so far, assuming that there are no hidden costs that I am currently unaware of.

Thomas Mailund
  • 1,674
  • 10
  • 16