9

How does PhantomData work in Rust? In the Nomicon it says the following:

In order to tell dropck that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that.

To me that seems to imply that when we add a PhantomData field to a structure, say in the case of a Vec.

pub struct Vec<T> {
    data: *mut T,
    length: usize,
    capacity: usize,
    phantom: PhantomData<T>,
}

that the drop checker should forbid the following sequence of code:

fn main() -> () {
    let mut vector = Vec::new();

    let x = Box::new(1 as i32);
    let y = Box::new(2 as i32);
    let z = Box::new(3 as i32);

    vector.push(x);
    vector.push(y);
    vector.push(z);
}

Since the freeing of x, y, and z would occur before the freeing of the Vec, I would expect some complaint from the compiler. However, if you run the code above there is no warning or error.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Novus
  • 187
  • 6
  • Also, this was just an experiment, I wouldn't write this type of code normally. – Novus Jan 08 '17 at 14:04
  • Vec also implements Drop which then drops its members – the8472 Jan 08 '17 at 14:14
  • 6
    `Vec::push` takes its argument by value, so `x` is moved into `vector`, which is why there's no error. This has nothing to do with `PhantomData`... – Matthieu M. Jan 08 '17 at 14:34
  • 1
    Thank Matthieu that makes sense but I am still confused then why the Vec implementation needs PhantomData at all. – Novus Jan 08 '17 at 14:53
  • You may want to read a bit earlier on that page: "The drop checker will generously determine that Vec does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor." – E_net4 Jan 08 '17 at 17:03
  • I did read that statement, but the word "worry" is ambiguous. What does it mean "worry" about dropping the T's. In particular does using PhantomData have implications for setting the drop flag for the Vec or something along those lines? I appreciate the explanations, maybe I should have been clearer with my initial question. – Novus Jan 08 '17 at 17:15

1 Answers1

14

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.


Deep dive: We have a combination of factors here:

  • We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).

  • Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.

  • However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.

  • The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.

  • So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)

  • In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).

  • Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).

  • BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."

The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).

pnkfelix
  • 3,770
  • 29
  • 45