3

I hope this isnt too much of a tangent however I think it is neccesary to understand the problem:

I am currently writing a JIT in rust that traces operations, compiles them to spirv and executes the computations on the GPU (heavily inspired by Dr.Jit from EPFL). I have written a backend for recording the operations (it supports manual reference counting). Every variable is accesed through an index into a vector. I now want to write a frontend for Rust where i have a global instance of the internal representation behind a Mutex. In order to use reference counting I have to decrement the counter for that variable when dropping it. This requires locking the Mutex in the drop function of a Variable.

impl Drop for Var {
    fn drop(&mut self) {
        IR.lock().unwrap().dec_ref_count(self.0);
    }
}

However because recording operations also requires locking the Mutex I reach a deadlock when passing the Variable index to a backend function.

impl<T: Into<Var>> ops::Add<T> for Var {
   type Output = Var;

   fn add(self, rhs: T) -> Self::Output {
       let rhs = rhs.into();
       let ret = Self(IR.lock().unwrap().[<$bop:lower>](self.0, rhs.0));
       drop(rhs);// This is a workaround and does not seem very idiomatic
       ret
   }
}

I fixed the problem by manually droping the variable after getting the value. However this does not seem very idiomatic.

Is there a way to structure this in Rust so that I don't get a Deadlock and don't have to repeat the manual drop calls? I would imagine that FFIs would have to deal with somethinge similar relatively often (as C code often uses global states) how do these resolve that issue and what are some resources on Deadlock prevention in Rust?

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • 1
    Why don't you use atomic reference counting, then you won't need to lock the mutex? – Chayim Friedman Feb 26 '23 at 11:28
  • Using many Arcs should be slower than having all the variables in a single vector espectially for many entries (Cache misses etc.). – TheOneTribble Feb 27 '23 at 12:09
  • I don't know how your data looks like, but using `Arc` will _always_ be faster than manually implementing refcounting with mutexes. – Chayim Friedman Feb 27 '23 at 12:19
  • Essentially I implemented an ordered graph without remove (the ref counting implementation does not remove elements, just invalidates them) operation with nodes as elements in the vector. I'm mostly concearned with the speed of DFS iterations over the graph. The alternative would be to use nested Arcs. Intuetively using a Vec seems faster though I'm not sure. – TheOneTribble Feb 27 '23 at 13:08
  • You don't need a separate allocation for refcounting; you can do it yourself within the same allocation (although this is more complicated). – Chayim Friedman Feb 27 '23 at 13:17
  • Do you know of some resources about this? I'd be happy o learn about it. – TheOneTribble Feb 28 '23 at 11:46
  • Implementing atomic refcounting is pretty easy, you just need an atomic field. Implementing it correctly is more difficult because you need to get the orderings right. If you're lazy, you can just use `SeqCst` everywhere. If you want to be more precise, for performance (but if you could use mutexes it likely doesn't matter) or just because you're pedantic (like me), I recommend https://marabos.nl/atomics/. – Chayim Friedman Feb 28 '23 at 13:14

2 Answers2

1

I did some more testing as I was a little confused why I ran into a deadlock while only acquiring a lock to a single resource. The code I posted without the drop also work. What does not work is returning the result to the expression directly.

impl<T: Into<Var>> ops::Add<T> for Var {
    type Output = Var;
    fn add(self, rhs: T) -> Self::Output {
        let rhs = rhs.into();
        Self(IR.lock().unwrap().add(self.0, rhs.0))
    }
}

However either saving the result in a temporary variable or acquiring the lock in separately resolves the problem.

impl<T: Into<Var>> ops::Add<T> for Var {
    type Output = Var;
    fn add(self, rhs: T) -> Self::Output {
        let rhs = rhs.into();
        let ret = Self(IR.lock().unwrap().add(self.0, rhs.0));
        ret
    }
}
impl<T: Into<Var>> ops::Add<T> for Var {
    type Output = Var;
    fn add(self, rhs: T) -> Self::Output {
        let rhs = rhs.into();
        let mut ir = IR.lock().unwrap();
        Self(ir.add(self.0, rhs.0))
    }
}

It seems that in the first example the MutexGuard is dropped after rhs is dropped. As the Rust Reference states (Thanks your answer @dudebro):

Notes:

Temporaries that are created in the final expression of a function body are >dropped after any named variables bound in the function body, as there is no >smaller enclosing temporary scope.

0

If just Self(IR.lock().unwrap().[<$bop:lower>](self.0, rhs.into().0)) doesn't work, you could consider doing something like this:

let ret = {
    let rhs = rhs.into();
    Self(IR.lock().unwrap().[<$bop:lower>](self.0, rhs.0))
}; // rhs's scope ends here
ret

Still not great, but since you need to get rhs to fall out of scope first, if it's not falling out of scope at the end of this function call with ...(self.0, rhs.into().0) (I assume this is a function call?), your options are basically using drop (also consider using as std::mem::drop for distinction), or scoping it with a block expression as in the example above.

This section of the Rust Reference has some more in depth details around the rules of dropping if you want to learn more.

dudebro
  • 112
  • 6