5

I'm trying to write a function that finds returns a mutable reference to an existing element in a Vec, or inserts it if it doesn't exist and returns a mutable reference to the new element.

I've tried a couple of times, but the borrow checker isn't convinced. I've simplified the code I was trying to write to the example below, which gives the same errors.

fn mut_find_or_insert<T: PartialEq>(vec: &mut Vec<T>, val: T) -> &mut T {
    if let Some(u) = vec.iter_mut().find(|u| **u == val) {
        u
    } else {
        vec.push(val);
        vec.last_mut().unwrap()
    }
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=cb12c38bcf3682b15a247d14aab48b6b

Rust gives me the following compiler error (full message through the playground link):

error[E0499]: cannot borrow `*vec` as mutable more than once at a time

This seems to be something that should be possible to implement in rust, however it's not clear to me how I reimplement this to avoid borrow checker errors.

vallentin
  • 23,478
  • 6
  • 59
  • 81
Chris Pearce
  • 666
  • 5
  • 16

2 Answers2

8

The reason this doesn't work as written is because of a limitation in the current borrow checker. This is very similar to NLL case #3, in which the compiler borrows somewhat overzealously for an entire match statement when the borrow is only used in one of the branches. With the experimental "Polonius" borrow checker (available on the nightly compiler with the -Z polonius flag), your code is accepted as-is.

Working in the stable compiler, it's probably a good idea to redesign your data structures as Sébastien Renauld's answer also suggests, but if you need to make this work with a Vec, you can work around it by briefly using an index to end the borrow:

fn mut_find_or_insert<T: PartialEq>(vec: &mut Vec<T>, val: T) -> &mut T {
    if let Some(i) = vec.iter().position(|each| *each == val) {
        &mut vec[i]
    } else {
        vec.push(val);
        vec.last_mut().unwrap()
    }
}

This works because the result of calling position is not a reference, so the borrow of vec is not held during the if let.

This is similar to the following questions, which manage to find the same limitation using early return from a loop:

trent
  • 25,033
  • 7
  • 51
  • 90
2

Vec is an unordered, not very structured type. It has no way to look up the exact position of an item within it; the closest the default functions get to is contains(), which only tells you if the item is contained.

Furthermore, due to the fact that a Vec is not a Set, the behavior of "find the item or append and return " is undefined - "find the item", if there is a duplicate, needs to be defined further.

To solve this problem without changing to the correct type (HashSet is the type you really want for this. Note the existence of get_or_insert(), which is literally what you are after. It pays to use the proper structure for the job, rather than to try to make everything fit a Vec), we're going to have to build it ourselves. Keeping to your signature, it looks like this (Playground):

trait VecSeekOrAppend<T:PartialEq>:Sized {
    fn get_or_insert(&mut self, item: T) -> &mut T;
}

impl<T> VecSeekOrAppend<T> for Vec<T> 
    where T: PartialEq + Clone {

    fn get_or_insert(&mut self, item: T) -> &mut T {
        if !self.contains(&item) {
            self.push(item.clone());
        }
        for i in self.iter_mut() {
            if i == &mut item {
                return i;
            }
        }
        unreachable!();
    }
}

The reason your initial version does not work is due to the returned lifetime requirement; all methods returning a reference from a Vec require a lifetime validity for the duration of use. By returning such a &mut reference, if you attempt to do it in one go, the mutation of the Vec<_> will happen while there is already a mutable borrow of it.

Splitting the cycle in two, and performing the insertion (without keeping a reference) to then find the eference, allows us to sidestep this problem. Another way to perform this is to store items by a serializable or hashable identifier (the exact way HashMap and HashSet work) in order to innately provide this layer of indirection.

There is a rust feature in the works to ease some of this pain (non-lexical lifetimes), but, as you can see from the github issue, it's not in the near future.

Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • Thanks for your suggestions. I completely agree that HashSet is a better fit, and I should probably not be stubborn about trying to use a Vec. However to feel satisfied about an answer I'd really appreciate more explaination of why specifically rust doesn't allow me to do what I'm trying to do. As you've highlighted, checking `contains` can add N to the complexity, and I want to understand how to avoid this sort of issue with the borrow checker in the future. – Chris Pearce Oct 05 '19 at 15:39
  • @ChrisPearce `HashSet` contains a `HashMap` internally, and as a result, `get_or_insert()` operates on the keys for the retrieval, not on the entire collection. This is the key difference and what my (working) snippet illustrates - due to the way lifetimes work, you're forced to break your step in two, whether it is in a different struct or like I did, with a different path. First, you edit, then you return. – Sébastien Renauld Oct 05 '19 at 15:43
  • It actually has very little to do with "rust" or "the borrow checker". It's down to code structure and sanity – Sébastien Renauld Oct 05 '19 at 15:44
  • I just looked into HashMap and I have a concern that using `get_or_insert` would not return a mutable reference that I could return from the function since you're not allowed to modify an entry inside a HashMap in a way that would invalidate the hash. Does that only leave me with the option of using a `Vec` separating the insertion from the return? – Chris Pearce Oct 05 '19 at 16:06
  • @ChrisPearce At this point I really need to ask what you are attempting to do. You have an **owned** T, which you use as default; the entire thing sounds like you really should actually split this into two. I'll rewrite the snippet in the answer to take into account the mutability requirement (all that changes is references) and provide a better, way more idiomatic approach. – Sébastien Renauld Oct 05 '19 at 16:12
  • Nevermind, there isn't a more idiomatic approach; you're going to split the insertion and the retrieval in every case. This is the best you'll get without refining your requirements (and choosing a better way to store stuff). I'm still puzzled as to your use case for this, however; after all, you are instantiating an object *by default* every time you use that method. Surely that completely defeats the purpose of what you're trying to do there – Sébastien Renauld Oct 05 '19 at 16:20
  • That's fair, I think that a lot of context has gone missing in my simplification of the code in an attempt to make it suitable as a public question. I'd accept your answer if you add something about why doing it one step doesn't work and your example code with it separated. Thanks for the help – Chris Pearce Oct 05 '19 at 16:26