2

Here's some example code for a simple struct World which contains a vector of Objects, for which each Object is assigned a category.

#[derive(PartialEq, Debug)]
enum Category {
    A, B, C, D
}

#[derive(Debug)]
struct Object {
    pub someData: f64,
    pub category: Category
    //...
}

struct World {
    pub objects: Vec<Object>,
    //...
}

impl World {
    pub fn new(objects: Vec<Object>) -> Self {
        World { objects }
    }

    pub fn getObjectsOfCategoryA(&self) -> Vec<&Object> {
        self.objects.iter().filter(|x| x.category == Category::A).collect()
    }
}

The World also offers the user the ability to query the objects of category A in particular. But, what if I want to call getObjectsOfCategoryA() frequently enough that, for performance reasons, I want to cache the result of the function? Ideally this caching should be opaque to any caller of getObjectsOfCategoryA().

Let's add the restriction that objects is guaranteed not to be mutated after the World is created. (I don't know how to express this restriction to Rust, but we'll get back to that later).

Object doesn't derive Copy or Clone so we can't just create a new vector of cloned objects as our cached vector.

One way to do it would be to use Arc:

struct World {
    objects: Vec<Arc<Object>>,
    objectsOfCategoryA: Vec<Arc<Object>>
}

impl World {
    pub fn new(objects: Vec<Object>) -> Self {
        let arcObjects: Vec<Arc<Object>> = objects.into_iter()
            .map(|x| Arc::new(x)).collect();
        let objectsOfCategoryA = arcObjects.iter().filter(|x| x.category == Category::A)
            .map(|x| x.clone()).collect();
        World { objects: arcObjects, objectsOfCategoryA }
    }

    pub fn getObjectsOfCategoryA(&self) -> &Vec<Arc<Object>> {
        &self.objectsOfCategoryA
    }
}

This strikes me as less than ideal because:

  1. We need to change the storage pattern of the main objects vector
  2. This doesn't intuitively indicate to the reader of the code that objectsOfCategoryA is simply a view into objects
  3. If objects is accidentally mutated, this will silently fail. Ideally, I'd like a compile error if anything tries to mutate objects after World has been constructed.

If there was some way for objectsOfCategoryA to be a Vec<&Object> that would feel 'right' to me, but from research I've done it seems like that's not possible.

I'm new to Rust, so it's quite possible I'm looking at this from too much of an OOP perspective. Can anyone indicate an idiomatic way to achieve this kind of caching?

Daniel
  • 1,125
  • 2
  • 9
  • 21
  • 1
    One sensible way to guarantee that `object`s is not mutated is to make it a non-pub field. If you want to allow users read access to `object`s, you could have an `objects(&self) -> &Vec` method. If you want users to have write access, you could have an `objects_mut(&mut self) -> &mut Vec` which invalidates your cache. – asky Aug 29 '20 at 02:02
  • @asky that is a good point - I will do that, but that still doesn't answer how to best do the caching part. – Daniel Aug 29 '20 at 02:16
  • yep, that's why it was a comment, not an answer! – asky Aug 29 '20 at 02:19

2 Answers2

1

You wish your cache of Objects of Category::A could be of type Vec<&Object>. This is not idiomatic and requires tinkering to work. The next best thing is a lazily-evaluated cache of type Option<Vec<&Object>>. If World is declared as

struct World<'a> {
    objects: Vec<Object>,
    category_a: Option<Vec<&'a Object>>,
    //...
}

You can initialize it as World { objects, None }, then when you need to get objects of Category::A, you could iterate through the Vec and populate the cache field (note: this requires a mut reference, which could be avoided with interior mutability).

pub fn getObjectsOfCategoryA(&'a mut self) -> &'a Vec<&Object> {
    if self.category_a.is_none() {
        self.category_a = Some(self.objects.iter().filter(|x| x.category == Category::A).collect());
    }
    self.category_a.as_ref().unwrap()
}

You could even allow mutating objects by wrapping World's objects.push() to properly update the cache, like so

// impl World {
// ...
pub fn push_inner(&'a mut self, obj:Object) {
    self.objects.push(obj);
    if self.objects.last().unwrap().category == Category::A {
        if let Some(category_a) = &mut self.category_a {
            category_a.push(self.objects.last().unwrap())
        }
    }
}

Here is a link to the full code used to test this.

asky
  • 1,520
  • 12
  • 20
  • 1
    Hmm - requiring a mutable reference seems pretty unacceptable to me, in my implementation the whole point of the world is that it is immutable and can be shared between threads. So for interior mutability to work, we're looking at requiring a mutex for this solution to be viable. At that point, the `Arc` solution feels better even though I still don't like it... – Daniel Aug 31 '20 at 00:36
0

We can't easily store a value and a reference to that value in the same struct in Rust, but here, we don't have to store references at all. All we need is a list of indices to the objects. get_objects_of_category_a() then only has to map the indices to references.

Since the list of objects is immutable, I chose to build the list of indices right in the constructor, for simplicity. It could also be initialized on demand.

struct World {
    objects: Vec<Object>,
    objects_of_category_a: Vec<usize>,
    //...
}

impl World {
    pub fn new(objects: Vec<Object>) -> Self {
        let objects_of_category_a = objects
            .iter()
            .enumerate()
            .filter(|&(_, x)| x.category == Category::A)
            .map(|(i, _)| i)
            .collect();
        World {
            objects,
            objects_of_category_a,
        }
    }

    pub fn get_objects_of_category_a(&self) -> Vec<&Object> {
        self.objects_of_category_a.iter().map(|&i| &self.objects[i]).collect()
    }
}

What if the objects are in a more complex data structure, e.g. a tree?

We could apply the same strategy as above, but instead of a usize, we need to represent a path to the relevant node. For a simple binary tree, we would need an enum like the following:

enum Path {
    /// The target is the current node.
    Stop,

    /// Set the target to the current node's left subnode.
    Left(Box<Path>),

    /// Set the target to the current node's right subnode.
    Right(Box<Path>),
}

However, since this is a recursive data structure, we need some form of indirection, which I've implemented using Box here. For a balanced tree, it also means that looking up an element will run in O(log n), whereas indexing a Vec runs in O(1).

Another option would be to store the objects in a Vec as above, and store indices in the tree instead.

Francis Gagné
  • 60,274
  • 7
  • 180
  • 155
  • You say world's `objects` is immutable, but since it is a pub field, how can you be sure? i.e. if the consumer of this API declared a `World` as `mut`, objects would be mutable. – asky Aug 29 '20 at 02:05
  • Good answer, but what about the case where objects isn't a `Vec`, but is a more complex data structure which doesn't support direct indexing? In my case it's a custom tree. (If the answer is "don't use that kind of data structure", or "add indexing to your tree", I guess I'd consider that perfectly valid ;) ) – Daniel Aug 29 '20 at 02:18