2

This is a minimal example that I struggled to get right.

I am trying to maintain a global Vec<Box<Item>>, the id of the Item is its index. When I want to fetch a reference to the Item, I can always get its id from somewhere, then get a reference by id (as ref_a in the code). But I would prefer directly getting the reference to the Item and pass it around (like ref_b), or even save it somewhere instead of saving the id. But my code doesn't work.

I see that in get_a_certain_item(), the return value &Item would have the same lifetime as VEC.read() thus it is not valid to let the reference escape. However, in my understanding, since all the Items are allocated with boxes in the heap, a reference to it should always be valid. There should be no harm to let the reference live longer than the read guard.

If I am not writing the code right, I guess there should be some idiomatic way to do this in Rust. I would appreciate some help.

// lazy_static = "0.1.15"
#[macro_use]
extern crate lazy_static;

use std::sync::RwLock;

struct Item {
    id: usize
}

lazy_static! {
    static ref VEC : RwLock<Vec<Box<Item>>> = RwLock::new(vec![
        Box::new(Item { id: 0 }), 
        Box::new(Item { id: 1 }), 
        Box::new(Item { id: 2 })]);
}

fn get_a_certain_item() -> &Item {
    & VEC.read().unwrap()[1]
}

fn get_a_certain_item_by_id() -> usize {
    1
}

fn main() {
    // this works, but verbose
    let ref_a = {& VEC.read().unwrap()[get_a_certain_item_by_id()]};

    // this doesn't work
    let ref_b = get_a_certain_item();
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
qinsoon
  • 1,433
  • 2
  • 15
  • 34
  • 2
    - *in my understanding, since all the Items are allocated with boxes in the heap, a reference to it should always be valid* - `Vec` will `drop` the `Box`es if they are removed or replaced, so no, they aren't always valid. `Box` makes no difference here. `Arc` would. Or if the `Item`s somehow had a `'static` lifetime. – ArtemGr Dec 17 '15 at 08:01
  • @ArtemGr I thought `Vec` owns those boxes and the `Item`s. Unless I remove or set elements in the `Vec`, the boxes are always valid. I can ensure that I won't mutate those elements once they are inserted, but how can I let the compiler know this? – qinsoon Dec 17 '15 at 14:08

2 Answers2

1

Compiling the code gives this error:

error: missing lifetime specifier [E0106]
    fn get_a_certain_item() -> &Item {
                               ^~~~~
help: run `rustc --explain E0106` to see a detailed explanation
help: this function's return type contains a borrowed value,
      but there is no value for it to be borrowed from
help: consider giving it a 'static lifetime

In Rust, lifetimes are simply parameterized placeholders, just like generic types (see this answer for more info). That means that every returned reference must have a lifetime that corresponds to some input reference. Your function doesn't have that.

If it were possible for the lifetimes to not correspond, then you'd be able to have code that returned a lifetime that could be whatever the caller wanted it to be. This is generally nonsense, as the reference will stop being valid at some point and thus you'd be breaking the memory safety rules.

What I just said is true, but leaves off one small but important corner case: the 'static lifetime. This is a built-in lifetime that corresponds to items compiled into the code. Normally this means global variables defined with static or references to literal values. These values exist before main is called and are destroyed after main has ended. It is impossible to actually create such values during the runtime of your program.

Note that the error message makes reference to the 'static lifetime. However, if you just add this lifetime, you will get a different error:

error: borrowed value does not live long enough
    &VEC.read().unwrap()[1]
     ^~~~~~~~~~~~~~~~~~~
note: reference must be valid for the static lifetime...
note: ...but borrowed value is only valid for the block at [...]

This is because the compiler cannot ensure that the value will last for the entire length of the program. In fact, it can only ensure it will last for the duration of the function call.

As the programmer, you may know (or think you know) better than the compiler. That's what the unsafe escape hatch is for. This allows you to do things that the compiler cannot verify. It does not allow you to break memory safety; it's just up to the programmer to ensure memory safety instead of the compiler.

In your case, if you can guarantee that items from the vector are never dropped, and that you always use a Box, then it should be safe to pretend that references to the Item are 'static.

A Boxed value is allocated on the heap, and the memory is never moved after the initial creation. Since items in the vector are not dropped, the Box will never be freed.

Here's a verbose example of implementing the method:

fn get_a_certain_item() -> &'static Item {
    // Best practice: put a paragraph explaining why this isn't
    // actually unsafe.
    unsafe {
        let as_ref: &Box<Item> = &VEC.read().unwrap()[1];
        let as_ref2: &Item = &**as_ref;
        let as_raw = as_ref2 as *const _;
        let unsafe_ref = &* as_raw;
        unsafe_ref
    }
}

Converting the reference to a raw pointer throws away the lifetime. When we reconstitute it we can make up whatever lifetime we want.


For what it is worth, I don't think it is worth it in this case. If I actually have a global variable, I want that to be front-and-center in my code as I view it as an ugly wart. I'd much rather create a type that owned a RwLock<Vec<Box<Item>>>, make a global of that type, then parameterize my code to accept a reference to that type. Then I lock the global when I need it and pass the reference into functions.

Community
  • 1
  • 1
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • Thanks for the explanation. I tried the code, and it was quite surprising that when I expected that those type casts and intermediate values will be opt'd out by rustc or llvm eventually, they all exist in the final machine code :/ – qinsoon Dec 23 '15 at 01:25
  • @qinsoon [here's the LLVM IR and assembly](https://gist.github.com/shepmaster/5293f546e7d2dc1ab039) that I generated. Can you help point me to where all the casts are? I mostly see references to the locking / unlocking / panicking when failing to lock. – Shepmaster Dec 23 '15 at 02:57
1

I can ensure that I won't mutate those elements once they are inserted

You can, can you?

But even if you really really can ensure that the vector will never be mutated, it's still a good practice to use the type system in such a way as to make the illegal states and operations impossible.

In this case, you can hide the Vec in a module, then any user of that module won't be able to mutate the Vec and ruin your invariants.

#[macro_use]
extern crate lazy_static;

// Hides the gory details from the user of the API.
pub mod items {
    use std::mem::transmute;

    pub struct Item {
        pub id: usize
    }

    lazy_static! {
        static ref VEC : Vec<Item> = vec![
            Item { id: 0 },
            Item { id: 1 },
            Item { id: 2 }];
    }

    pub fn get_an_item (idx: usize) -> Option<&'static Item> {
        // As Shepmaster has pointed out, Rust is smart enough to detect
        // that the vector is immutable and allow the 'static lifetime:
        VEC.get(idx)

        // And when it isn't that smart, we can use `unsafe`
        // to tell the compiler that the 'static lifetime is okay:
        /*
        match VEC.get (idx) {
            Some (item) => {
                // `unsafe` means "safe, scout's honor", cf. http://doc.rust-lang.org/book/unsafe.html
                let item: &'static Item = unsafe {transmute (item)};
                Some (item)
            },
            None => None
        }
        */
    }
}

fn main() {
    let ref_b = items::get_an_item (1) .expect ("!1");
    assert_eq! (ref_b.id, 1);
}

Note that, since the Vec is immutable, there's no need to Box the Items. This might be nice from the data-driven, cache locality perspective.

And if a user of this module tries for an undefined behavior with a code like this items::VEC.push (items::Item {id: 3}); he'll get an "error: static VEC is private".

ArtemGr
  • 11,684
  • 3
  • 52
  • 85
  • I have a feeling that the vector isn't truly immutable, it's just append-only. – Shepmaster Dec 17 '15 at 18:33
  • @Shepmaster Wow, that's cool. If the vector is append-only then I'd be looking for http://en.cppreference.com/w/cpp/container/deque in Rust. – ArtemGr Dec 17 '15 at 18:39