flat_map on a Vec>>

Question

Here is a very simplified code where I am using i8 and Vec::iter just for the sake of an example, but the actual types and iterators are more involved (so don't hang on &i8 below). Lets say

struct Foo {
    bars: Vec<RwLock<Vec<i8>>>,
}

How can I implement a flat_map on the inner vectors:

fn iter(&self) -> impl Iterator<Item=&i8>

Below code does not compile because of temporary references and lifetimes:

impl Foo {
    fn iter(&self) -> impl Iterator<Item=&i8> {
        self.bars.iter().flat_map(|bar| {
            bar.read().unwrap().iter()
        })
    }
}

Playground link to the error.

Any way I can achieve the same result as a flat_map on a Vec<RwLock<Vec<T>>>?

When you say "don't hang on the `&i8` below," do you mean that the final interior type is non-trivial? Does it implement `Copy`? Without implementing `Copy`, how do you expect this to behave if someone modifies the inner Vec on another thread and deletes the Item while you hold the Iterator? If it does implement Copy, then you can return the copy rather than a borrow. — Rob Napier, Apr 10 '22 at 22:48
There's not really a way to implement this, a `RwLock` yields an intermediate "guard" that must stay alive while the reference is in use, but returning a bare reference would attempt to circumvent that. Even if you make an intermediate vec with all the guards, you wouldn't be able to return it as a `impl Iterator` since the references will be bound to the iterator `self`, which is not allowed. — kmdreko, Apr 10 '22 at 23:07
@RobNapier The outer Vec cannot be modified because `fn iter(&self)` has a reference on self, which is carried over the return type as well `Item=&i8`. The inner vecs cannot be modified either, because each is wrapped in a RwLock. — behzad.nouri, Apr 10 '22 at 23:18
But you drop the RwLock guard when your return from the closure passed to the `flat_map` (as kmdreko explains more fully). You're not returning the guard along with the Item. So another thread (or even just later code in the same thread) can take the lock and modify the inner Vec and drop the Item. If feels like you may have the wrong mental picture of how Rust manages memory. Having a reference to something doesn't keep it alive (or maintain a lock). You have to prove the thing will stay alive in order to *get* a reference. (I agree with kmdreko that this design feels impossible.) — Rob Napier, Apr 10 '22 at 23:46
@RobNapier that is the whole question: "how to hold on to each of those RwLocks while iterating on the contents that specific RwLock?" — behzad.nouri, Apr 11 '22 at 00:11
I think this probably can be implemented but will require several layers of values to work the lifetimes out, and the user of this function won't be able to be completely blind to those layers. An implementation is going to be very awkward and honestly probably not worth it. You might consider implementing `for_each` instead and taking a closure that accepts `&i8`. This isn't nearly as flexible as iterators, but could be a workable compromise. — cdhowie, Apr 11 '22 at 01:05
@behzad.nouri To explain why this can't work directly, remember that the lifetime of the `&i8` is tied to the inner vector, whose lifetime is tied to the RwLock guard. In the signature `fn iter<'a, 'b>(&'a self) -> impl Iterator`, where is the lifetime `'b` supposed to come from? `'b` is contained by `'a` but is smaller than it, and there's no way for the caller to supply a compatible lifetime because the caller doesn't actually know what it is. — cdhowie, Apr 11 '22 at 01:33

cdhowie · Answer 1 · 2022-04-11T04:43:30.083

This cannot work with the signature you have written:

fn iter(&self) -> impl Iterator<Item=&i8>

Due to lifetime elision, this actually means:

fn iter<'a>(&'a self) -> impl Iterator<Item=&'a i8>

However, the reference to each item is only valid for as long as you have a reference to an inner Vec<i8> and those references are only valid while you have an existing RwLockReadGuard.

You actually need a signature more like this:

fn iter<'a, 'b>(&'a self) -> impl Iterator<Item=&'b i8>

But this can't work either, because it's the caller who provides lifetime parameters, and the caller has no way to provide a suitable 'b, which would have to be the lifetime of the guard.

Further, there are multiple guards in such a flattening iteration (one for each inner Vec<i8>) and a flat-map operation is usually going to discard them when the end of an inner vec is reached, but you can't do that here because the Iterator protocol requires that all items have the same type -- all the produced references must therefore be valid for the same lifetime, and that lifetime would have to cover a guard for every inner vector at once, meaning you'd have to lock everything. (And even if you do that, the caller still wouldn't have a way to provide that lifetime.)

That's not all -- we can't really store the guard in a custom iterator, because that would require the iterator return a reference whose lifetime is limited to the lifetime of the iterator itself, and that's not possible.

There are basically two options here.

The first involves exposing knowledge of your data structure in some way, e.g. returning an impl Iterator<Item=RwLockReadGuard<'a, Vec<i8>>> and forcing your consumers to do the flattened iteration themselves. They will likely have to collect the iterator into a new vector (locking everything at once, as mentioned above) and then flatten, if that's how they want to traverse the structure.

The second involves inverting control of the iteration by having the caller provide a closure that accepts an &i8 and iterating yourself, invoking the closure for each item. This rules out using any of the useful iterator utilities, unfortunately, but is the cleanest interface I can think of to provide this iteration while avoiding the complexities of iterating over nested data under a series of guards. We can sprinkle Result into the interface to allow the closure to terminate the loop early by signaling an error, at least.

impl Foo {
    pub fn iterate_nested<TFn, TErr>(&self, mut f: TFn) -> Result<(), TErr>
        where TFn: FnMut(&i8) -> Result<(), TErr>
    {
        for lock in self.bars.iter() {
            let guard = lock.read().expect("rwlock is poisoned");
            for i in guard.iter() {
                f(i)?;
            }
        }
        
        Ok(())
    }
}

If you wanted to make this a bit more flexible, you could have it provide reduction in addition to iteration. Simple iteration can then be implemented as a reduction to ():

impl Foo {
    pub fn reduce_nested<TAccum, TErr, TFn>(&self, mut accum: TAccum, mut f: TFn)
        -> Result<TAccum, TErr>
        where TFn: FnMut(TAccum, &i8) -> Result<TAccum, TErr>
    {
        for lock in self.bars.iter() {
            let guard = lock.read().expect("rwlock is poisoned");
            for i in guard.iter() {
                accum = f(accum, i)?;
            }
        }
        
        Ok(accum)
    }
    
    pub fn iterate_nested<TErr, TFn>(&self, mut f: TFn) -> Result<(), TErr>
        where TFn: FnMut(&i8) -> Result<(), TErr>
    {
        self.reduce_nested((), |_, v| f(v))
    }
}

score 1 · Answer 2 · answered Apr 11 '22 at 03:12

You cannot.

Someone needs to notify the RwLock when it is not locked anymore so future locks will succeed. This someone is the guard (RwLockReadGuard), and it does that when it is dropped. So you cannot discard it and return reference to its contents, as it will be a reference to unlocked data, enabling a data race.

In general, using interior mutability primitives like RefCell, Mutex or RwLock cannot be hidden in API boundaries.

Sometimes, you can use methods like parking_lot's RwLockReadGuard::map() to explicitly mention the lock (like fn iter(&self) -> MappedRwLockReadGuard<'_, impl Iterator<...>>), but not in your case, because flat_map() requires combining multiple guards.

That being said, there are multiple ways you can come close. For example, you can collect all guards into a Vec before beginning the iteration (example). This will lock them all, each one until its iteration is over, and also not work with Iterator and require a streaming iterator (lending iterator) and GAT, but it works. I think you can do better, but I suspect it will require unsafe code and it may also be self-referential, a big pain by itself.

For more:

flat_map on a Vec>>

2 Answers2