9

I have a type:

struct Foo {
    memberA: Bar,
    memberB: Baz,
}

and a pointer which I know is a pointer to memberB in Foo:

p: *const Baz

What is the correct way to get a new pointer p: *const Foo which points to the original struct Foo?

My current implementation is the following, which I'm pretty sure invokes undefined behavior due to the dereference of (p as *const Foo) where p is not a pointer to a Foo:

let p2 = p as usize -
    ((&(*(p as *const Foo)).memberB as *const _ as usize) - (p as usize));

This is part of FFI - I can't easily restructure the code to avoid needing to perform this operation.

This is very similar to Get pointer to object from pointer to some member but for Rust, which as far as I know has no offsetof macro.

Community
  • 1
  • 1
Mystor
  • 365
  • 3
  • 9
  • Why can't you just pass in a `*const Foo` to the C code instead? – Shepmaster Aug 07 '16 at 23:57
  • In this particular example, the FFI is giving me a `*const Baz`, and I am expected to retrieve the original object from it. If I could simply pass around the original object, I would, but that is not an option. – Mystor Aug 08 '16 at 03:10

2 Answers2

10

The dereference expression produces an lvalue, but that lvalue is not actually read from, we're just doing pointer math on it, so in theory, it should be well defined. That's just my interpretation though.

My solution involves using a null pointer to retrieve the offset to the field, so it's a bit simpler than yours as it avoids one subtraction (we'd be subtracting 0). I believe I saw some C compilers/standard libraries implementing offsetof by essentially returning the address of a field from a null pointer, which is what inspired the following solution.

fn main() {
    let p: *const Baz = 0x1248 as *const _;
    let p2: *const Foo = unsafe { ((p as usize) - (&(*(0 as *const Foo)).memberB as *const _ as usize)) as *const _ };
    println!("{:p}", p2);
}

We can also define our own offset_of! macro:

macro_rules! offset_of {
    ($ty:ty, $field:ident) => {
        unsafe { &(*(0 as *const $ty)).$field as *const _ as usize }
    }
}

fn main() {
    let p: *const Baz = 0x1248 as *const _;
    let p2: *const Foo = ((p as usize) - offset_of!(Foo, memberB)) as *const _;
    println!("{:p}", p2);
}
Francis Gagné
  • 60,274
  • 7
  • 180
  • 155
  • Note: This currently does not work for statics and constants as [dereferencing raw pointers](https://github.com/rust-lang/rust/issues/51911) and [casting pointers to integers](https://github.com/rust-lang/rust/issues/51910) is unstable in these scopes. – dcoles Oct 27 '19 at 06:30
  • Note: Unfortunately dereferencing a null pointer is (now) UB according to [RFC-2582](https://github.com/rust-lang/rfcs/blob/master/text/2582-raw-reference-mir-operator.md), see notably the last section on **offsetof woes**. I believe that should this RFC be accepted, [my alternative implementation](https://stackoverflow.com/a/40310851/147192) would be sound. Emphasis on *believe*. – Matthieu M. Jan 30 '20 at 09:02
8

With the implementation of RFC 2582, raw reference MIR operator, it is now possible to get the address of a field in a struct without an instance of the struct and without invoking undefined behavior.

use std::{mem::MaybeUninit, ptr};

struct Example {
    a: i32,
    b: u8,
    c: bool,
}

fn main() {
    let offset = unsafe {
        let base = MaybeUninit::<Example>::uninit();
        let base_ptr = base.as_ptr();
        let c = ptr::addr_of!((*base_ptr).c);
        (c as usize) - (base_ptr as usize)
    };
    println!("{}", offset);
}

The implementation of this is tricky and nuanced. It is best to use a crate that is well-maintained, such as memoffset.


Before this functionality was stabilized, you must have a valid instance of the struct. You can use tools like once_cell to minimize the overhead of the dummy value that you need to create:

use once_cell::sync::Lazy; // 1.4.1

struct Example {
    a: i32,
    b: u8,
    c: bool,
}

static DUMMY: Lazy<Example> = Lazy::new(|| Example {
    a: 0,
    b: 0,
    c: false,
});

static OFFSET_C: Lazy<usize> = Lazy::new(|| {
    let base: *const Example = &*DUMMY;
    let c: *const bool = &DUMMY.c;
    (c as usize) - (base as usize)
});

fn main() {
    println!("{}", *OFFSET_C);
}

If you must have this at compile time, you can place similar code into a build script and write out a Rust source file with the offsets. However, that will span multiple compiler invocations, so you are relying on the struct layout not changing between those invocations. Using something with a known representation would reduce that risk.

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366