0

I am trying to return a reference to a data array from inside a callback. The snippet below is not possible because of lifetimes, but I added it anyway to give a better context.

I want to implement some kind of virtual filesystem. I want to use the return type &[u8] because I am thinking about using mmap and the implementation that looked promising exposed &[u8] to access the data.

This is overkill for now, so I want to focus on the callback to read and return the contents of the file that is passed onto it.

What would be an idiomatic way of doing this?

use std::fs::File;
use std::io::prelude::*;

fn main() {
    test(&|path| {
        if false {
            let mut data: Vec<u8> = Vec::new();
            let mut file = File::open(path).unwrap();
            file.read_to_end(&mut data).unwrap();
            return Some(&data);
        }
        None
    });
}

// loads various files. I do not care about them anymore once this function returns
pub fn test<'a>(loader: &Fn(&str) -> Option<&'a [u8]>) {}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jonas Felber
  • 409
  • 4
  • 14
  • In this case, there's no difference between a closure and a function. – Shepmaster Mar 20 '17 at 22:39
  • Additionally, it's more idiomatic to accept a generic `F` where `F: Fn(&str)`, not the trait object `&Fn(&str)`. – Shepmaster Mar 20 '17 at 22:42
  • While the questions are superficially similar, the answers to the other question do not really apply here. In this case the OP is asking how to do something that actually makes sense, they're just approaching it wrong. Specifically the OP needs to restructure the code so that the clsoure doesn't directly return a reference. Instead, the closure should return an object that manages the lifetime of the resource, *and* provides a method to get to the reference. To make `test` usable with different resources, the closure should likely return a boxed trait object. (See my answer for details.) – user4815162342 Mar 20 '17 at 22:54
  • 1
    @user4815162342 OP specifically said to disregard the fact that they were going to use `mmap`, so I did. If you think that this question is actually about returning a boxed trait object, we can change the question to clarify, but that will invalidate the *other* existing answer. Which of the 3 wrongs do you feel is the lesser sin? – Shepmaster Mar 20 '17 at 22:58
  • @Shepmaster It is my understanding that the OP wants to first test other aspects of the design without immediately implementing the full `mmap` logic. On the other hand, the *interface* (the signature of the closure) should be flexible enough to add support for `mmap` later - which is why the OP wanted to return `&[u8]`. But that is my understanding, and I certainly wouldn't change the question myself - it is up to the OP to do so if they choose to after reading these comments. It is perfectly fine for both answers to remain, each one valid in context of its understanding of the question. – user4815162342 Mar 20 '17 at 23:11
  • Thank you very much for your help! I am sorry for not being more precise, but @user4815162342 came very close to what I wanted to do while explaining what I did wrong (conceptually) so I will not edit my question. – Jonas Felber Mar 22 '17 at 00:02

2 Answers2

3

It is incorrect to return reference to stack-allocated data, as it will immediately outlive the object it refers to. The only kind of references that can always be returned, no questions asked, are those whose lifetime is 'static - which Rust will carefully check. References to freshly allocated data are definitely not 'static.

Fortunately, there is a way around it: it is safe to return a reference when Rust can prove that the reference outlives the data. For example:

// Memory backed by a Vec
struct VecMemory {
    data: Vec<u8>
}

impl VecMemory {
    fn as_slice(&self) -> &[u8] {
        &self.data
    }
}

as_slice() may return a reference because that reference provably outlives the object it refers to. If we undo the lifetime elision, the signature of as_slice() would be:

fn as_slice<'a>(&'a self) -> &'a [u8]

The next question is what should the closure return? If it returned a Vec, as suggested by @E_net4, or a VecMemory (which again just holds a Vec), then the use of a vector as the underlying storage would be baked into the interface. To support different storage types, the closure should return what other languages would call an interface. The closest Rust equivalent is a trait object, which is specified in return context as Box<SomeTrait>.

With this design, the closure effectively heap-allocates a resource management object and returns a two-pointer-sized box that provides ownership and a uniform interface to the heap-allocated value. The user of the box is communicating with the implementation only through the box, which uses an internal vtable to talk to the implementation. (Pointer to the vtable is the reason why the Box itself takes up two pointers, not one.) In other words, return value of the closure is such that it erases the concrete type returned.

use std::fs::File;
use std::io::prelude::*;

trait Memory {
    fn as_slice(&self) -> &[u8];
    // a real-life trait would likely also define
    // as_slice(&mut self) -> &mut [u8]
}

// Memory backed by a Vec
struct VecMemory {
    data: Vec<u8>
}

impl Memory for VecMemory {
    fn as_slice(&self) -> &[u8] {
        &self.data
    }
}

fn main() {
    test(&|path| {
        if false {
            let mut data: Vec<u8> = Vec::new();
            let mut file = File::open(path).unwrap();
            file.read_to_end(&mut data).unwrap();
            return Some(Box::new(VecMemory { data: data }));
        }
        None
    });
}

// loader returns a boxed trait object whose underlying memory
// can be accessed as long as the box is alive.
fn test<'a>(_loader: &Fn(&str) -> Option<Box<Memory>>) {}

To use mmap for the storage, one would write a different Memory implementation, say Mmap. This one would store the raw pointer and the size of the memory returned by mmap(). It would call mmap() in new() and munmap() in Drop::drop. Most importantly, Mmap would implement Memory using an unsafe block to construct a slice from the stored pointer and length. Again, this is safe because lifetime of the reference will be tied to the lifetime of Mmap.

user4815162342
  • 141,790
  • 18
  • 296
  • 355
2

You do not want to return a reference here because your data only exists inside the closure. Unless you wish to change the callback API to something that modifies a buffer by mutable reference, a simpler (and still idiomatic) approach would be returning the vector.

test(&|path| {
    if false {
        let mut data: Vec<u8> = Vec::new();
        let mut file = File::open(path).unwrap();
        file.read_to_end(&mut data).unwrap();
        return Some(data);
    }
    None
});

Then, modify the consumer function test to return the vector, thus retaining ownership. If you need to, you can obtain a reference to the data inside the vector by calling as_slice.

pub fn test<'a>(loader: &Fn(&str) -> Option<Vec<u8>>) {}

I want to use the return type &[u8] because I am thinking about using mmap and the implementation that looked promising exposed &[u8] to access the data

You probably mean that you would like your functions to return &[u8]. Even in this case, the data has to be owned somewhere else, and this is something that you have to handle yourself. This could involve having some kind of ResourceHandler struct that would provide slices that live as long as the resource handler.

But this is overkill for now, so I want to focus on the callback to read and return the contents of the file that is passed onto it.

In that case, you might be fine with returning a Vec for the time being. :)

E_net4
  • 27,810
  • 13
  • 101
  • 139