3

Note that this is purely an academic question, from a language lawyer perspective. It's about the theoretically safest way to accomplish the conversion.

Suppose I have a void* and I need to convert it to a 64-bit integer. The reason is that this pointer holds the address of a faulting instruction; I wish to report this to my backend to be logged, and I use a fixed-size protocol - so I have precisely 64 bits to use for the address.

The cast will of course be implementation defined. I know my platform (64-bit Windows) allows this conversion, so in practice it's fine to just reinterpret_cast<uint64_t>(address).

But I'm wondering: from a theoretical standpoint, is it any safer to first convert to uintptr_t? That is: static_cast<uint64_t>(reinterpret_cast<uintptr_t>(address)). https://en.cppreference.com/w/cpp/language/reinterpret_cast says (emphasis mine):

Unlike static_cast, but like const_cast, the reinterpret_cast expression does not compile to any CPU instructions (except when converting between integers and pointers or on obscure architectures where pointer representation depends on its type).

So, in theory, pointer representation is not defined to be anything in particular; going from pointer to uintptr_t might theoretically perform a conversion of some kind to make the pointer representable as an integer. After that, I forcibly extract the lower 64 bits. Whereas just directly casting to uint64_t would not trigger the conversion mentioned above, and so I'd get a different result.

Is my interpretation correct, or is there no difference whatsoever between the two casts in theory as well?

FWIW, on a 32-bit system, apparently the widening conversion to unsigned 64-bit could sign-extend, as in this case. But on 64-bit I shouldn't have that issue.

user4520
  • 3,401
  • 1
  • 27
  • 50
  • 3
    A `void*` on a DS9K has an 8-bit heap tag, 56-bit segment tag, 64-bit low access, 64-bit high access, and 64-bit offset. Those 256-bits will not fit very well in a 64-bit integer. They do fit into a 256-bit `uintptr_t`. YMMV based on your platform(s) under consideration. – Eljay Jan 03 '22 at 17:47
  • "The reason is that this pointer holds the address of a faulting instruction; " --> At least in C (and I think applies to C++), `void*` is good enough for an _object_. A pointer to a _function_ (or an address in a function) may need something wider than a `void *`. So the starting premise is amiss - even before trying to convert to an integer. I doubt there is any spec support for _address of an instruction_. Code is in implementation defined territory outside the spec. Better "to report this to my backend to be logged" report it as a pointer, not an integer. – chux - Reinstate Monica Jan 03 '22 at 18:15
  • @chux-ReinstateMonica Conditionally-supported for function pointers in C++ and only guarantee is that (if supported) conversion to `void*` and back yields the same function pointer value. But I think as soon as you have a `void*` the rules for conversion of `void*` to integers should apply. https://eel.is/c++draft/expr.reinterpret.cast#8 Of course address of instructions in general is nowhere defined. – user17732522 Jan 03 '22 at 18:22
  • @user17732522 Thanks. Seems the issue should then be converting a function pointer (or instruction address) directly to some wide integer and not go through `void *`, if possible. "A pointer can be explicitly converted to any integral type large enough to hold all values of its type. The mapping function is implementation-defined." – chux - Reinstate Monica Jan 03 '22 at 18:25
  • @chux-ReinstateMonica In this case the OS gives the instruction address to you as a `void*`: https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-exception_record. Of course this is where we get into implementation-specific territory. – user4520 Jan 03 '22 at 20:53
  • @user4520 Unclear then why looking for a language lawyer answer to an implementation specific question. BTW, that link uses `PVOID`, which may [differ](https://stackoverflow.com/a/494169/2410359) from `void *`. Perhaps only in the past, code API written as if it could change in the future - implying a `void *` solution here may not apply later. – chux - Reinstate Monica Jan 03 '22 at 21:30
  • @user4520 Why not simply use `reinterpret_cast(address)` instead of `static_cast(reinterpret_cast(address))`? Why the need for a 64-bit integer? Note: "I forcibly extract the lower 64 bits." does extract the lower 64 bits of `uintptr_t`, but not certainly the lower 64 bits of the pointer. – chux - Reinstate Monica Jan 03 '22 at 21:37
  • @chux-ReinstateMonica Well, I just wanted to figure out what the rules of converting a pointer to an integer were from a theoretical standpoint, this is the part I'm looking for a language lawyer answer to. The pointer type itself is of course implementation defined, as you point out, but that's outside the scope of the question. The need for a 64-bit integer is because the protocol to report faults is fixed and I cannot change it - the instruction address must be sent as 64 bits. – user4520 Jan 04 '22 at 07:29

1 Answers1

6

You’re parsing that (shockingly informal, for cppreference) paragraph too closely. The thing it’s trying to get at is simply that other casts potentially involve conversion operations (float/int stuff, sign extension, pointer adjustment), whereas reinterpret_cast has the flavor of direct reuse of the bits.

If you reinterpret a pointer as an integer and the integer type is not large enough, you get a compile-time error. If it is large enough, you’re fine. There’s nothing magical about uintptr_t other than the guarantee that (if it exists) it’s large enough, and if you then re-cast to a smaller type you lose that anyway. Either 64 bits is enough, in which case you get the same guarantees with either type, or it’s not, and you’re screwed no matter what you do. And if your implementation is willing to do something weird inside reinterpret_cast, which might give different results than (say) bit_cast, neither method will guarantee nor prevent that.

That’s not to say the two are guaranteed identical, of course. Consider a DS9k-ish architecture with 32-bit pointers, where reinterpret_cast of a pointer to a uint64_t resulted in the pointer bits being duplicated in the low and high words. There you’d get both copies if you went directly to a uint64_t, and zeros in the top half if you went through a 32-bit uintptr_t. In that case, which one was “right” would be a matter of personal opinion.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • "If you reinterpret a pointer as an integer and the integer type is not large enough, you lose information.": Such a cast would be ill-formed. – user17732522 Jan 03 '22 at 17:47
  • Good point - the width is checked when you go pointer to integer (though not the other way around). – Sneftel Jan 03 '22 at 17:52
  • 1
    I think there is also https://eel.is/c++draft/expr.reinterpret.cast#5 to consider in case one tries to convert the integer back to a pointer. That is only guaranteed to work if the value of the `reinterpret_cast` to the same integer type is used directly. – user17732522 Jan 03 '22 at 18:08
  • I see, thanks for clearing that up. So the case you describe at the end is pretty much what happened in https://stackoverflow.com/questions/42178107/converting-a-pointer-to-64-bit-integer-why-is-the-result-different-on-32-bit, isn't it? There the implementation decided to sign-extend the pointer when converting for some reason. – user4520 Jan 03 '22 at 21:29
  • 1
    Yes, but more obviously problematic. As @user17732522 mentioned, the double cast is not guaranteed to work properly asa round trip back to pointer. Consider the case where that conversion was implemented using the high word of the uint64_t. – Sneftel Jan 04 '22 at 01:48