1

I am trying to implement a cyclic linked data structure in Rust. My Nodes are defined as:

#[derive(Debug)]
enum Node<'a> {
    Link(&'a Node<'a>),
    Leaf,
}

I am trying to build a minimal structure like this (extra brackets for better lifetime visibility):

fn main() {
    let placeholder = Node::Leaf;
    {
        let link1 = Node::Link(&placeholder);
        {
            let link2 = Node::Link(&link1);
            println!("{:?}", link2);
        } // link2 gets dropped
    } // link1 gets dropped
}

This works, but now I don't know how to replace the placeholder with a reference to link2 to "close the cycle". I tried this, but it doesn't work because I can't assign to link1, which is borrowed the line above, and because link2 would go out of scope while it's still referenced by link1:

let placeholder = Node::Leaf;
{
    let mut link1 = Node::Link(&placeholder);
    {
        let link2 = Node::Link(&link1);
        link1 = Node::Link(&link2);
        println!("{:?}", link2);
    } // link2 gets dropped
} // link1 gets dropped
error: `link2` does not live long enough
  --> src/main.rs:15:9
   |
13 |             link1 = Node::Link(&link2);
   |                                 ----- borrow occurs here
14 |             println!("{:?}", link2);
15 |         } // link2 gets dropped
   |         ^ `link2` dropped here while still borrowed
16 |     } // link1 gets dropped
   |     - borrowed value needs to live until here

error[E0506]: cannot assign to `link1` because it is borrowed
  --> src/main.rs:13:13
   |
12 |             let link2 = Node::Link(&link1);
   |                                     ----- borrow of `link1` occurs here
13 |             link1 = Node::Link(&link2);
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^ assignment to borrowed `link1` occurs here

I could try to avoid these lifetime issues by using Rcs, but that sounds like it would defeat the purpose of Rust's 0-runtime-cost lifetime management.

Another attempt of putting all nodes into a Vec and only taking direct references to that didn't work either:

use std::ops::IndexMut;

let mut store = Vec::new();
store.push(Node::Leaf);
store.push(Node::Link(&store[0]));
store.push(Node::Link(&store[1]));
*store.index_mut(1) = Node::Link(&store[2]);

Because I can't change any nodes without mutably borrowing them, but they have all been immutably borrowed already.

error[E0502]: cannot borrow `store` as immutable because it is also borrowed as mutable
  --> src/main.rs:12:28
   |
12 |     store.push(Node::Link(&store[0]));
   |     -----                  ^^^^^    - mutable borrow ends here
   |     |                      |
   |     |                      immutable borrow occurs here
   |     mutable borrow occurs here

error[E0502]: cannot borrow `store` as mutable because it is also borrowed as immutable
  --> src/main.rs:13:5
   |
12 |     store.push(Node::Link(&store[0]));
   |                            ----- immutable borrow occurs here
13 |     store.push(Node::Link(&store[1]));
   |     ^^^^^ mutable borrow occurs here
14 |     *store.index_mut(1) = Node::Link(&store[2]);
15 | }
   | - immutable borrow ends here

error[E0502]: cannot borrow `store` as immutable because it is also borrowed as mutable
  --> src/main.rs:13:28
   |
13 |     store.push(Node::Link(&store[1]));
   |     -----                  ^^^^^    - mutable borrow ends here
   |     |                      |
   |     |                      immutable borrow occurs here
   |     mutable borrow occurs here

error[E0502]: cannot borrow `store` as mutable because it is also borrowed as immutable
  --> src/main.rs:14:6
   |
12 |     store.push(Node::Link(&store[0]));
   |                            ----- immutable borrow occurs here
13 |     store.push(Node::Link(&store[1]));
14 |     *store.index_mut(1) = Node::Link(&store[2]);
   |      ^^^^^ mutable borrow occurs here
15 | }
   | - immutable borrow ends here

Is there a way to have such a structure with cyclic links without runtime overhead, for example with pure references like I tried? I'm fine with additional memory cost, like a backing Vec holding ownership to all nodes.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Felk
  • 7,720
  • 2
  • 35
  • 65
  • 1
    How do you plan to use this data-structure? What kind of datum will the nodes hold? How do you plan to enforce the Aliasing XOR Mutability tenet of Rust for it? If you could sketch what you hope to obtain, how you expect to avoid memory leaks, and how you expect to solve Aliasing XOR Mutability, then we could help you realize this in Rust. Otherwise? You'll get tons of different proposals (there are many different ways, each with their trade-offs), which may or may not be useful. – Matthieu M. May 06 '17 at 19:07
  • 1
    *but that sounds like it would defeat the purpose of Rust's 0-runtime-cost lifetime management* => Rust is 0-cost with it can and has a minimal cost otherwise; `Rc` and `Weak`, `RefCell` are perfectly valid tools... though they muddy the code and push the validation of the use of the data to run-time instead of compile-time (thus increasing the likelihood of bugs). – Matthieu M. May 06 '17 at 19:09
  • My goal is to depict a BNF grammar. These problems only occur during "knotting together" of the nodes. My plan is to somehow knot these nodes together using immutable references only, referencing node enums being owned once each by a (private) vec with the same lifetime: Those 2 probably bundled up in a struct, so when the struct gets dropped, both the vector and the cyclic structure get dropped simultaneously without problems. – Felk May 06 '17 at 19:33
  • Okay, sounds easy-ish; please edit this into your question :) – Matthieu M. May 07 '17 at 10:16

1 Answers1

3

Is there a way to have such a structure with cyclic links without runtime overhead, for example with pure references like I tried? I'm fine with additional memory cost, like a backing Vec holding ownership to all nodes.

Yes, there are many ways.

It is however fundamental to realize that Rust requires enforcing the Aliasing XOR Mutability principle at all times. It is preferable to enforce it at compile-time, so as to have 0 run-time cost, however it is sometimes required to manage it at run-time and there are multiple structures for this:

  • Cell, RefCell, AtomicXXX, Mutex, RWLock (the ill-named) are about mutating aliased items safely,
  • Rc, Weak, Arc, are about having multiple owners.

Balancing the potential run-time overhead compared to the obtained flexibility is a fine art; it takes some experience and experimentation.


In your specific case (building a BNF grammar with nodes referencing each-other), we can indeed achieve 0 run-time overhead using Cell for mutability.

The main difficulty however is obtaining a group of nodes with the same lifetime. You are on the right track with the idea of a Vec<Node>, but as you noticed as soon as you borrow one node you cannot mutate the Vec again: this is because growing the Vec could cause it to re-allocate its underlying storage which would render the already obtained reference dangling (pointing to freed memory).

The generic solution would be to use unsafe code to manage the lifetime of the nodes yourself. You are, however, lucky: as you mentioned, your case is special because the number of nodes is bounded (by the grammar definition), and you drop them all at once. This calls for an arena.

My advice is therefore twofold:

  1. Stash nodes in a typed-arena,
  2. Use Cell for the mutable part (&T is Copy).

You will not be able to store the arena in the same struct as the rest of the grammar without unsafe code; it's up to you whether to use unsafe or structure your program so that the arena outlives the computation.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • I'm coming back to this now trying out your solution, and I am having a hard time figuring out what you mean by "use Cell for the mutable part". After I've put my nodes into the arena, I don't know how Cell would help me swapping out the placeholder – Felk May 21 '17 at 12:31
  • I got it to work if I alter my struct to hold references to Cells of nodes `&'a Cell>` instead of references themselves `&'a Node<'a>`. Is this what you meant? If yes, this is a bit unfortunate as I think the usage of `Cell` should be an implementation detail instead of part of the signature – Felk May 21 '17 at 12:43
  • @Felk: Nearly. I meant that if you want to update the content of `Link`, it should be in a cell: `Link(Cell<&'a Node<'a>>)`. `Cell` requires that the type it contains be `Copy` and `&T` is `Copy`. As for exposing `Cell` as part of the `API`, it's up to you. You can (1) just expose it, (2) encapsulate the `enum` as a private member of a struct or (3) have one enum with `Cell` for building the graph and then one without for exposure (you can transmute `Cell` to `T` safely if nobody modifies `T` afterwards). – Matthieu M. May 21 '17 at 12:57
  • That worked for building up the structure, thanks. However, you say I can transmute all my nodes from `Cell` to `T`? Can you give an example how? – Felk May 21 '17 at 14:25
  • `std::mem::transmute` is the method to use. It's unsafe, but since `Cell` is a bare `T` underneath the covers it should work well enough. – Matthieu M. May 21 '17 at 19:07