6

I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.

I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:

pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
    let original = Arc::from_raw(handle);
    let copy = original.clone();
    mem::forget(original);
    Arc::into_raw(copy)
}

Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!

randomPoison
  • 1,310
  • 1
  • 9
  • 13
  • `clone` increments the refcount, and `from_raw`, `into_raw` and `mem::forget` do not touch it, so the effect of calling `clone_raw` is just to increase the refcount by 1. I assume this is just a dummy example (i.e. in your real code you would do something other than `mem::forget` with `original`)? – trent Jan 09 '20 at 22:54
  • (I think this is safe in the sense of "very unlikely to do anything bad", I'm just trying to make sure I'm not overlooking something weird) – trent Jan 09 '20 at 23:08
  • yeah your example is poor, this is safe but unlikely to do what you want. In fact, what do you want ? – Stargateur Jan 09 '20 at 23:54
  • In the example I gave, I need to do `mem::forget(original)` so that it isn't dropped at the end of the function, which would decrement the ref count. The other option would be to do `Arc::into_raw(original)` instead, as shown [in this example](https://users.rust-lang.org/t/is-it-safe-to-clone-a-type-erased-arc-via-raw-pointer/36723/2?u=excaliburhissheath). – randomPoison Jan 10 '20 at 17:16

1 Answers1

5

This is almost always unsafe.

An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like

struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,  // You get a raw pointer to this element
}

into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.

If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.

In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.


Update:

Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do

#[repr(align(32))]
struct Foo;

let foo = Arc::new(Foo);

In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.

There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like

struct ArcBox<T>(Arc<Box<T>>)

and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.

user2722968
  • 13,636
  • 2
  • 46
  • 67
  • 2
    Great answer. I added the `T: ?Sized` bound to your reproduction of `ArcInner` because that is what ensures that `data` will always be at the end of the layout of `ArcInner` -- without it, or `repr(C)`, Rust would be able to place `data` wherever it wanted (which would mean all bets were off, even for a `T` less strictly aligned than `usize`). – trent Jan 10 '20 at 15:01
  • Thank you for the thorough answer! I suspected this wouldn't work the way I wanted, but your answer makes it clear why. – randomPoison Jan 10 '20 at 17:12
  • How does rust find the arc that it previously `mem::forget` about? – user1685095 Aug 16 '21 at 14:22
  • @user1685095 Every `Arc` is essentially just a `std::ptr::NonNull>`. So as long as there is at least one such pointer left and as long as the reference counter in `ArcInner` matches the number of `Arc` that will be dropped, other `Arc` can be re-constructed and everything will be fine. Only the very last `Arc` to be dropped also deallocates the `ArcInner`. – user2722968 Aug 16 '21 at 16:26
  • @user2722968 l don't see how that answers my question. I asked how previously allocated `Arc` is found in `Arc::from_raw`. – user1685095 Aug 17 '21 at 13:59
  • @user1685095, the `Arc` is not allocated at all, it's just a pointer stored in a register or on the stack. When an `Arc` is dropped normally, it deferences its inner pointer to `ArcInner` (which is allocated on the heap), decrements the reference count stored there, and deallocates the `ArcInner` if the reference count reaches zero (that is, if it was the last `Arc`). When you `forget()` an `Arc`, the reference count never decrements and `Arc` simply vanishes. – user2722968 Aug 17 '21 at 20:30
  • @user1685095 It might be helpful to realize that ALL `Arc` referencing the same object are actually simply the same pointer. It's the ownership mechanics where one owner (`self`) equates to one increment/decrement of the counter that makes everything work. There is no need to "find" the "previous" `Arc`. If you `from_raw()`, you need to guarantee that the reference count is correct (actually: one to large). Given that, the "new" `Arc` is just another copy of the same pointer, whose destructor will decrement the reference count in the future. – user2722968 Aug 17 '21 at 20:32