6

I'm trying to implement a lazy-construction / memoized evaluation / caching idiom in Rust.

There's an outer type which has a bunch of data and an accessor method. The accessor method needs to either return a cached calculation (if it has one) or calculate it and store the return value in a map for later use. The cached value doesn't need to reference the outer value so there's no circular reference problem; but it does need access to the outer value's data in order to construct itself.

Here's a full example which doesn't pass Rust's borrow checker:

use std::collections::HashMap;

pub struct ContainedThing {
    count: usize,
}

impl ContainedThing {
    fn create(thing: &Thing) -> ContainedThing {
        // create uses an arbitrary number of attributes from Thing
        // it doesn't keep any references after returning though
        let count = thing.map.len();
        ContainedThing { count: count }
    }
}

struct Thing {
    map: HashMap<i32, ContainedThing>,
}

impl Thing {
    pub fn get(&mut self, key: i32) -> &ContainedThing {
        self.map
            .entry(key)
            .or_insert_with(|| ContainedThing::create(&self))
    }
}

fn main() {}

The specific error is:

error[E0502]: cannot borrow `self` as immutable because `self.map` is also borrowed as mutable
  --> src/main.rs:24:29
   |
22 |         self.map
   |         -------- mutable borrow occurs here
23 |             .entry(key)
24 |             .or_insert_with(|| ContainedThing::create(&self))
   |                             ^^                         ---- borrow occurs due to use of `self` in closure
   |                             |
   |                             immutable borrow occurs here
25 |     }
   |     - mutable borrow ends here

I'm having a really hard time figuring out a good way to implement this idiom. I tried pattern matching the return value of get() instead of the entry() API, but still fell foul of the borrow checker: the match expression also ends up borrowing self.

I can rewrite get like this:

pub fn get(&mut self, key: i32) -> &ContainedThing {
    if !self.map.contains_key(&key) {
        let thing = ContainedThing::create(&self);
        self.map.insert(key, thing);
    }
    self.map.get(&key).unwrap()
}

but this is pretty ugly (look at that unwrap) and seems to require more lookups and copies than ought to be necessary. Ideally, I would like

  1. to only pay the cost of finding the hash entry once. entry(), done right, ought to track the insertion location when not found.
  2. reduce the number of copies of the freshly constructed value. This may be infeasible, ideally I'd have an in-place construction.
  3. to avoid using unwrap; without a pointless pattern match, that is.

Is my clumsy code the best that can be achieved?

E_net4
  • 27,810
  • 13
  • 101
  • 139
Barry Kelly
  • 41,404
  • 5
  • 117
  • 189
  • Gonna be a duplicate of http://stackoverflow.com/q/30681468/155423 or similar. You can't pass `self` because it's in the process of being changed. If `entry` handed you a reference to the `HashMap`, then maybe you could perform further lookups. – Shepmaster Aug 17 '16 at 14:34
  • @Shepmaster: I would say it is slightly different, because of the use of the reference inside the lambda. The order of execution here is fine: during the execution of the lambda the hash-map is frozen. – Matthieu M. Aug 17 '16 at 15:27
  • 1
    Is borrowing `&Thing` necessary, or is "extracting" whatever you need for later computation (here `.len()`) out of `&Thing` possible prior? You would avoid the borrowing then, and therefore could you use `or_insert_with`. – Matthieu M. Aug 17 '16 at 15:28
  • @MatthieuM. yeah, they are slightly different which is why I didn't use the dupehammer. However, while the `HashMap` is frozen, it is also in an *unknown state* when the closure is executed — it may not even be possible to perform a lookup in the map at that point! And your second comment is why I linked to that pseudo-duplicate, specifically [the second answer](http://stackoverflow.com/a/38944523/155423). Split out the important bits and pass them down. May not be feasible though... – Shepmaster Aug 17 '16 at 15:30
  • It's not really like http://stackoverflow.com/q/30681468/155423 because I don't need a mutable cycle. I don't need a mutable reference to `self` while deciding what to do in the case of not found. It only needs to be grabbed mutably after I've constructed my value after not finding it. It's quite upsetting not to be able to use either the closure or pattern matching here. @MatthieuM. extracting is a no-go without introducing another layer of indirection - the real case is building an index for potentially multi-gigabyte in-memory data on demand. An extra indirection feels really clunky. – Barry Kelly Aug 17 '16 at 18:39
  • 1
    I understand how you feel, certainly, however even if I was designing the API of hash map from scratch your use case is quite "nasty" for borrowing. There is no way to upgrade a `&T` into a `&mut T` (obviously), so your requirement of using `&Thing` after the failed look-up but before the insertion requires to "pause" the borrowing... if designing from scratch I could think about a "get" returning a "hint" that could later be re-submitted on `insert`... but it's quite weird really :x So I think that the double look-up is actually the best solution here; hopefully it'll be rare. – Matthieu M. Aug 17 '16 at 19:38
  • @BarryKelly I mean no offense. If you review the revision messages, you see what I changed and why: "standard Rust indentation and style; update error messages". Every time a question gets brought back up to the top of the active list or marked as a duplicate, I try to review it for correctness and modern Rust style. I update error messages so that people who find the question can quickly identify if it matches their problem. If I don't do this, I find the quality of a question or answer to be lacking. – Shepmaster Jul 27 '18 at 12:37
  • 1
    I do this for almost *every* question and answer tagged [tag:rust] in the hopes of making them the most useful they can be. Q&A on Stack Overflow are *meant* to be edited towards this purpose and I don't expect people who asked the question to spend time performing this maintenance. I frankly am unable to see how anyone could see this janitorial work as a negative thing, much less as a direct attack on them personally. – Shepmaster Jul 27 '18 at 12:40

2 Answers2

4

The answer is that it depends on specifically which state you need access to in the or_insert_with closure. The problem is that or_insert_with definitely cannot have access to the map itself because the entry api takes a mutable borrow of the map.

If all you need for ContainedThing::create is just the size of the map, then you'll just need to calculate the map size ahead of time.

impl Thing {
    pub fn get(&mut self, key: i32) -> &ContainedThing {
        let map_size = self.map.len();
        self.map.entry(&key).or_insert_with(|| {
            // The call to entry takes a mutable reference to the map,
            // so you cannot borrow map again in here
            ContainedThing::create(map_size)
        })
    }
}

I think the spirit of the question was more about general strategies, though, so let's assume there's some other state within Thing that is also required to create ContainedThing.

struct Thing {
    map: HashMap<i32, ContainedThing>,
    some_other_stuff: AnotherType, //assume that this other state is also required in order to create ContainedThing
}

impl Thing {
    pub fn get(&mut self, key: i32) -> &ContainedThing {
        //this is the borrow of self
        let Thing {
            ref mut map,
            ref mut some_other_stuff,
        } = *self;

        let map_size = map.len();
        map.entry(key).or_insert_with(|| {
            // map.entry now only borrows map instead of self
            // Here you're free to borrow any other members of Thing apart from map
            ContainedThing::create(map_size, some_other_stuff)
        })
    }
}

Whether that's really cleaner than your other solution of manually checking if self.map.contains_key(&key) is up for debate. Destructuring tends to be the strategy that I go for, though, to allow borrowing specific members of self instead of the entire struct.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
allTwentyQuestions
  • 1,160
  • 7
  • 11
  • Here's a wrinkle: `ContainedThing::create` may call `Thing::get` itself; think recursive construction from memoized simpler base cases, like dynamic programming from the top down. The necessity of using the API of `Thing` during the construction of `ContainedThing` - recursively in particular - is what leads me to conclude that the manual check will have to stay. (And yes, this wrinkle makes it into a problem with a potential cycle. I've simply thought a bit further down the road after solving this idiom problem. :) Have an upvote for the destructure though. – Barry Kelly Aug 19 '16 at 03:02
1

So, the primary motivation for me wanting to pass Thing as an argument to ContainedThing::create was to use Thing's API to help in the construction. However, it turns out that I'll want it to be borrowed mutably, because I need recursive memoized construction here, and that makes it a cycle problem.

So, it looks like my separated check + insert logic is here to stay.

Barry Kelly
  • 41,404
  • 5
  • 117
  • 189