1

Suppose we have a standard-layout class, simplified to this:

struct X {
    int num;
    Object obj; // also standard layout
    char buf[512];
};

As I understand it, if we have an instance of X, we can take its address and cast it to char* and look at the content of X as if it was an array of bytes.

However, it's a little less clear if we do the following, taking the address of a member and walking past it as if we were walking through X itself:

X x;
char* p = reinterpret_cast<char*>(&x.obj) + sizeof(Object);
*((int*)p) = 1234; // write an int into buf

Is this valid and well defined?

I was commenting to a colleague that if we take the address of obj we should limit ourselves to the memory of that object, and not assume that reading past its end is safely going into the next field of the containing struct (buf in this case). I have been asked to defend that statement. In looking for an answer only managed to confuse myself a bit by all the similar questions that don't seem to address this precisely. :)

Standard citations appreciated. Also, I'd prefer to stick to the question "is this is valid", and ignore the question of its sensibility.

Chris Uzdavinis
  • 6,022
  • 9
  • 16
  • "Is this valid and well defined?" can't be, as you don't know about the relationship of the length of obj and alignment on your specific platform. Generally, once you lose the type information, very few things that you can do to your data are still "well-defined". – Marcus Müller Jun 23 '22 at 12:01
  • @MarcusMüller What is "the context of X"? I provided the struct definition of X at the top. Sorry if it wasn't clear that code was a continuation. – Chris Uzdavinis Jun 23 '22 at 12:04
  • @MarcusMüller Thanks, that was a typo. I meant `content`. Fixed. – Chris Uzdavinis Jun 23 '22 at 12:09
  • `reinterpret_cast(&x.obj) + sizeof(Object)` violates https://timsong-cpp.github.io/cppwp/n4861/expr.add#6.sentence-1, see e.g. https://stackoverflow.com/a/62341088/ for details – Language Lawyer Jun 23 '22 at 15:13

2 Answers2

1

I was commenting to a colleague that if we take the address of obj we should limit ourselves to the memory of that object,

Well said!

char* p = reinterpret_cast<char*>(&x.obj) + sizeof(Object);

Nope, sizeof(Object) isn't necessarily a multiple of your platform's alignment, so that the next member (.buf) might not necessarily start immediately at .obj's end.

Generally, hm. If you really need to do this, write a union that's either a char[512] or a int or whatever you need it to be, and put it in place of char buf[512]; that's what they're for.


Taking your code and demonstrating this:

#include "fmt/format.h"
#include <algorithm>
#include <array>
#include <string>
#include <utility>
#include <vector>

struct Object {
  std::array<char, 17> values;
};
struct X {
  int num;
  Object obj; // also standard layout
  char buf[512];
};

int main() {
  X x;
  auto ptr_to_x = reinterpret_cast<char *>(&x);
  auto ptr_distance = reinterpret_cast<char *>(&x.buf) - ptr_to_x;

  std::vector<std::pair<std::string, unsigned int>> statements{
      {"instance x", sizeof(x)},
      {"class X", sizeof(X)},
      {"distance between beginning of X and buf", ptr_distance},
      {"Object", sizeof(Object)}};
  std::size_t maxlenkey = 0;
  std::size_t maxlenval = 0;
  for (const auto &[key, val] : statements) {
    maxlenkey = std::max(key.size(), maxlenkey);
    maxlenval = std::max(fmt::formatted_size("{:d}", val), maxlenval);
  }

  for (const auto &[key, value] : statements) {
    fmt::print("length of {: <{}s} {:{}d}\n", key, maxlenkey, value, maxlenval);
  }
}

prints:

length of instance x                              536
length of class X                                 536
length of distance between beginning of X and buf  21
length of Object                                   17

So, writing 4 bytes at ((char*)&x)+sizeof(Object) will definitely not do the same as actually writing to buf.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • that's wrong. See my amended answer! – Marcus Müller Jun 23 '22 at 12:34
  • But I really don't understand the difference between what I do here, showing that no, you can't just add something to the beginning of one member to get to the next, and what you want to do. We're not "reading" anything in that member object. – Marcus Müller Jun 23 '22 at 12:59
  • I'm disagreeing with your conclusion it "will definitely not do the same as actually writing to buf". Your output shows that it is correctly computing the value. "length of object" is 17, and `obj` it sits behind a 4-byte int, and `buf` is 21 bytes offset from the beginning, and 17+4. Both ways compute the same value. https://godbolt.org/z/qaPMEdrGf I'm more interested in C++ standard citations than code snippets since the observed behavior in our code NOT incorrect, but I'd like to know precisely where the standards says it's not valid to compute it that way. (And thanks!) – Chris Uzdavinis Jun 23 '22 at 20:26
1

As I understand it, if we have an instance of X, we can take its address and cast it to char* and look at the content of X as if it was an array of bytes.

The standard doesn't actually allow this currently. Casting to char* will not change the pointer value (per [expr.static.cast]/13) and as a result you will not be allowed to apply pointer arithmetic on it as it violates [expr.add]/4 and/or [expr.add]/6.

This is however often assumed to be allowed in practice and probably considered a defect in the standard. The paper P1839 by Timur Doumler and Krystian Stasiowski is trying to address that.

But even applying the proposed wording in this paper (revision P1839R5)

X x;
char* p = reinterpret_cast<char*>(&x.obj) + sizeof(Object);
*((int*)p) = 1234; // write an int into buf

will have undefined behavior, at least assuming I am interpreting its proposed wording and examples correctly. (I might not be though.)

First of all, there is no guarantee that buf will be correctly aligned for an int. If it isn't, then the cast (int*)p will produce an unspecified pointer value. But also, there is no guarantee in general that there is no padding between obj and buf.

Even if you assume correct alignment and no padding, because e.g. you have guarantees from your ABI or compiler, there are still problems.

First, the proposal would only allow unsigned char*, not char* or std::byte*, to access the object representation. See "Known issues" section.

Second, after fixing that, p would be a pointer one-past the object representation of obj, so it doesn't point to an object. As a consequence the cast (int*)p cannot point to any int object that might have been implicitly created in buf when X x;'s lifetime started. Instead [expr.static.cast]/13 will apply and the value of the pointer remains unchanged.

Trying to dereference the int* pointer pointing one-past-the-end of the object representation of obj will then cause undefined behavior (as it is not pointing to an object).

You also can't save this using std::launder on the pointer, because a pointer to an int nested inside buf would give you access to bytes which are not reachable through a pointer to the object representation of buf, violating std::launder's precondition, see [ptr.launder]/4.


In a broader picture, if you look at how e.g. std::launder is specified, it seems to me that the intention is definitively not to allow this. The way it is specified, it is impossible to use a pointer (in)to a member of a class (except the first if standard layout) to access memory of other (non-overlapping) members. This specifically seems to be intended to allow a compiler to do optimization by pointer analysis based on assuming that these other members are unreachable. (I don't know whether there is any compiler actually doing this though.)

user17732522
  • 53,019
  • 2
  • 56
  • 105