1

Let's say I have an API:

// api.h - Others can #include this header
#include <cstdint>

class A {
 public:
  // Write data into an opaque type.
  // The user shouldn't directly access the underlying bits of this value.
  // They should use this method instead.
  void WriteVal(uint32_t data);

 private:
  uint64_t opaque_value_;
};
// api.cpp
#include <cstdlib>
#include "api.h"

namespace {

// This is the underlying struct that the opaque value represents.
struct alignas(8) Impl {
  uint32_t x;
  uint32_t y;
};

}  // namespace

void A::WriteVal(uint32_t data) {
  uint64_t *opaque_ptr = &opaque_value_;
  Impl *ptr = reinterpret_cast<Impl *>(opaque_ptr);
  memcpy(&ptr->y, &data, sizeof(data));
}

Is there any undefined behavior in the A::WriteVal method?

My guess would be NO for these reasons:

  1. Reinterpret casting between a uint64_t * and Impl * on it's own is ok since the alignment of the pointee types is the same.
  2. There is only UB if ptr were to be dereferenced explicitely since that would break strict aliasing rules.
  3. memcpy can be safely used in substitution of explicit dereferencing regardless of the original type of the pointer.

Is my reasoning correct? If this is also considered UB, is there a nice C++ way of writing to an opaque type without illegal type punning methods.

My goal is to cleanly perform operations on an opaque value that, under the hood, represents a struct that users shouldn't know the details about.

BrockLee
  • 931
  • 2
  • 9
  • 24
  • 1
    Why do you need `Impl`? Why not simply `memcpy` to `reinterpret_cast(&opaque_value_) + sizeof(uint32_t)`? BTW, you dereference `ptr` explicitly in `&ptr`, don't you? – Daniel Langr Apr 27 '20 at 07:41
  • You still have an issue with possible struct padding. There's no need for the struct though, as DanielLangr has suggested. – Sander De Dycker Apr 27 '20 at 07:52
  • You also might need to be wary of endianness. – Sander De Dycker Apr 27 '20 at 07:53
  • @DanielLangr This is meant to be a simplified example. `Impl` can be a struct with any number of arbitrary fields to represent some complex logic. The idea behind this is it's cleaner and takes more advantage of the type system to have the compiler get the offset of a member through member access operators rather than casting to a `char *` and taking `offsetof`. Also, `&ptr->y` doesn't actually dereference the the pointer. This is equivalent to the cast then offsetof you mention, but cleaner. If you look at the assembly for this also, you'll see that `ptr` is never actualy dereferenced. – BrockLee Apr 27 '20 at 08:04
  • @LeoCHan Sorry, I overlooked myself, you're right, you don't dereference `ptr`. However, as Sander pointed out, you have a problem with padding (though unlikely, there may be some between `x` and `y`). Also, I am not sure whether alignment of `uint64_t` is guaranteed to be 8. In theory, it may be more. But you can static assert that both size and alignment of `Impl` and `uint64_t` are the same. – Daniel Langr Apr 27 '20 at 08:16
  • Np. I didn't put this in the example, but lest assume that there isn't padding between `x` and `y` and that `uint64_t` is guaranteed to be 8 bytes on the target I'm using, so `static_assert(sizeof(Impl) == sizeof(uint64_t) && alignof(Impl) == alignof(uint64_t))` passes. This is the case at least on x86_64-linux-gnu. – BrockLee Apr 27 '20 at 08:28
  • @LeoCHan Dereferencing is a semantic action and does not necessarily result in any memory accesses. That is, you can't deduce its absence by looking at generated code. – molbdnilo Apr 27 '20 at 09:54
  • @molbdnilo So what would be the explicit definition of a C++ semantic dereference is? My understanding is that it's essentially just "loading a value at an address", which is done through either *, [], a.b, a->b, although I can't find any documentation to support this. – BrockLee Apr 27 '20 at 23:42

1 Answers1

2

Your reasoning covers undefined behavior caused by a strict aliasing violation. Using memcpy is indeed a way to do type punning in a defined way.

There are other potential sources of undefined behavior though. In your example code, the alignment and padding of the struct should be controlled as well :

struct alignas(uint64_t) Impl {
  uint32_t x;
  uint32_t y;
};
static_assert(sizeof(Impl) == sizeof(uint64_t), "sizeof(Impl) not valid");

I don't see any other possible sources of undefined behavior in your example code.

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40