2

I’m trying to create a simple cache that has one operation: “get from cache, loading if necessary”. Here’s a working example (just using file loading for simplicity):

use std::collections::HashMap;
use std::fs::File;
use std::io;
use std::path::{Path, PathBuf};

#[derive(Debug, Default)]
pub struct FileCache {
    /// Map storing contents of loaded files.
    files: HashMap<PathBuf, Vec<u8>>,
}

impl FileCache {
    /// Get a file's contents, loading it if it hasn't yet been loaded.
    pub fn get(&mut self, path: &Path) -> io::Result<&[u8]> {
        if let Some(_) = self.files.get(path) {
            println!("Cached");
            return Ok(self.files.get(path).expect("just checked"));
        }
        let buf = self.load(path)?;
        Ok(self.files.entry(path.to_owned()).or_insert(buf))
    }
    /// Load a file, returning its contents.
    fn load(&self, path: &Path) -> io::Result<Vec<u8>> {
        println!("Loading");
        let mut buf = Vec::new();
        use std::io::Read;
        File::open(path)?.read_to_end(&mut buf)?;
        Ok(buf)
    }
}

pub fn main() -> io::Result<()> {
    let mut store = FileCache::default();
    let path = Path::new("src/main.rs");
    println!("Length: {}", store.get(path)?.len());
    println!("Length: {}", store.get(path)?.len());
    Ok(())
}

The success branch of the if let has an extra call to self.files.get and an extra expect. We just called that and pattern-matched on its result, so we should like to just return the match:

    pub fn get(&mut self, path: &Path) -> io::Result<&[u8]> {
        if let Some(x) = self.files.get(path) {
            println!("Cached");
            return Ok(x);
        }
        let buf = self.load(path)?;
        Ok(self.files.entry(path.to_owned()).or_insert(buf))
    }

But this fails the borrow checker:

error[E0502]: cannot borrow `self.files` as mutable because it is also borrowed as immutable
  --> src/main.rs:20:12
   |
14 |     pub fn get(&mut self, path: &Path) -> io::Result<&[u8]> {
   |                - let's call the lifetime of this reference `'1`
15 |         if let Some(x) = self.files.get(path) {
   |                          ---------- immutable borrow occurs here
16 |             println!("Cached");
17 |             return Ok(x);
   |                    ----- returning this value requires that `self.files` is borrowed for `'1`
...
20 |         Ok(self.files.entry(path.to_owned()).or_insert(buf))
   |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

I don’t understand why these two versions behave differently. Isn’t self.files borrowed for &self lifetime in both cases? In the first form, we drop the borrow and obtain a new one, but I don’t see why that should make a difference. How does the second form enable me to violate memory safety, and how can I write this code without the extra lookup and expect check?

I’ve read How do I write a rust function that can both read and write to a cache?, which is related, but the answers there either duplicate the lookup (as in my working example) or clone the value (prohibitively expensive), so aren’t sufficient.

wchargin
  • 15,589
  • 12
  • 71
  • 110

1 Answers1

2

Both implementations should be legal Rust code. The fact that rustc rejected the second one is actually a bug as tracked by issue 58910.

Let's elaborate:

pub fn get(&mut self, path: &Path) -> io::Result<&[u8]> {
    if let Some(x) = self.files.get(path) { // ---+ x
        println!("Cached");                    // |
        return Ok(x);                          // |
    }                                          // |
    let buf = self.load(path)?;                // |
    Ok(self.files.entry(path.to_owned())       // |
        .or_insert(buf))                       // |
}                                              // v

By binding to variable x in let some expression, self.files is borrowed as immutable. The same x is later returned out to the caller, which implies self.files remains borrowed until some point in the caller. So in current Rust borrow checker implementation, self.files is unnecessarily kept being borrowed the whole get function. It can't be borrowed again later on as mutable. The fact that x is never used after the early return is not taken into consideration.

Your first implementation works as a workaround to this. Since let some(_) = ... doesn't create any binding, it doesn't have the same problem. It does incur additional runtime cost though due to multiple lookups.

This is the problem case #3 described in Niko's NLL RFC. Good news is:

TL;DR: Polonius is supposed to fix this.

(Polonius is the more sophisticated 3rd incarnation of Rust borrow checker which is still in developement).

edwardw
  • 12,652
  • 3
  • 40
  • 51
  • Fascinating. Thanks for the reference! I’ll keep around the extra runtime cost for now, then (my application isn’t concurrent, so it’s just runtime cost and not a correctness issue), and look into Polonius in more detail. Thank you! – wchargin Oct 20 '19 at 05:54
  • @wchargin yeah, this is not a soundness issue. rustc rejects something it shouldn't have. A bit annoying, that's all. – edwardw Oct 21 '19 at 11:47