9

We have:

struct A {
  int x;
  int y;
} a;

Assuming that:

offsetof(struct A, x) + sizeof(int) == offsetof(struct A, y)

Does the C standard (i.e. C11) guarantee that &a.x + 1 == &a.y is true?

If not, does any mainstream compiler guarantee that?

Moreover, assuming that the equality is satisfied, can the value of a.y be accessed via (&a.x)[1] without UB?

What about memcpy(&a.x+1, ...)?


EDIT

Accessing a.y with (&a.x)[1] is indeed UB, at least for CLANG. See example by user @NateEldredge.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • 1
    See also: https://stackoverflow.com/questions/45966762/can-an-equality-comparison-of-unrelated-pointers-evaluate-to-true – dbush Sep 05 '21 at 22:30
  • @dbush, it's related, though it focuses on two separate objects with automatic storage. This case focuses on subobjects of a larger object – tstanisl Sep 06 '21 at 06:12
  • If there's no padding, you can access a.y with `memcpy((char*)&a.x+sizeof(int), ...` and that's well-defined. You can inspect a struct byte by byte with character type pointers, but you can't do the same using integer types etc. – Lundin Sep 06 '21 at 09:50
  • @Lundin, so there would be no UB if `x` and `y` were `char`? – tstanisl Sep 06 '21 at 10:01
  • It would still be implementation-defined, as there are no guarantees about padding. But yeah if `x` was char you could do `a.x[1]` which is equivalent to `*(&a.x + 1)` and access that part of the struct. This is allowed by 6.3.2.3/7 "When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object." It's still a bit fishy to use array notation for that access though. – Lundin Sep 06 '21 at 10:08
  • @Lundin, does it mean that `char` pointers can bypass "UB wall" at "defined until one past the last element" restriction? – tstanisl Sep 06 '21 at 10:15
  • It's a bit unclear what "successive increments" mean - if we can use any addition such as `*(&a.x + n)` or if we have to write a loop increasing a pointer or index one byte at a time. In practice I would be surprised if it didn't boil down to the same thing. But yeah I suppose it isn't UB to access an array out of bounds with lvalue access through character pointers, given that you somehow know that there's memory allocated beyond that array (eg because it is part of a struct). – Lundin Sep 06 '21 at 10:50

2 Answers2

7

Yes. C 2017 6.5.9, which discusses == and !=, says:

6 Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

7 For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

By paragraph 7, a.x and a.y each acts as an array of one int, for the purposes of ==.

Since offsetof(struct A, x) + sizeof(int) == offsetof(struct A, y) guarantees that a.y immediately follows a.x in the address space, &a.x+1 == &a.y satisfies the last condition in paragraph 6, that one is a pointer to one past the end of one array object and the other is a pointer to a different array object that immediately follows the first.

Moreover, assuming that the equality is satisfied, can the value of a.y be accessed via (&a.x)[1] without UB?

No. The fact that &a.x+1 equals &a.y does not mean it is &a.y.

This is not fully or explicitly stated in the standard. There are situations where pointer arithmetic must be able to traverse objects that are adjacent in memory, particularly structure members. For example, if we convert a pointer to a structure to a char *, we can use it to access the individual bytes in the structure. We can use it to traverse the entire structure, including the members. Then, if we increment it appropriately to point to some member and convert it back to a pointer to that member’s type, we ought to have a pointer to that member.

However, the C standard is written in natural language, not completely with notation of formal logic or mathematics (although there is some), so it is incomplete, and we are not always sure of what it specifies. Since it does not tell us that &a.x + 1 can be used to access a.y, the behavior is undefined by omission.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 2
    Interestingly, does this mean that if we had `struct A { int x; int y; int z; }` and we knew that `offsetof(struct A, x) + sizeof(int) * 2 == offsetof(struct A, z)` - then `&a.x + 2 == &a.y` would be undefined? – Daniel Kleinstein Sep 05 '21 at 20:51
  • @DanielKleinstein: Yes. Array arithmetic does not have to work beyond that single sentinel position at the end of the array. E.g., calculation might go awry because the compiler used some form of modulo-wrapping arithmetic instead of full-pointer-size arithmetic. – Eric Postpischil Sep 05 '21 at 20:54
  • 1
    This answer is logical but scares me a bit. Does it imply that in well behaved program even though two pointers `a` and `b` are both non-null, valid, and equal they *may not* be used to modify the same object? – tstanisl Sep 05 '21 at 21:32
  • What about `struct a {double a, char b, int c, double d}` is `&c == &a + 2`? –  Sep 05 '21 at 22:01
  • @DanielKleinstein, it's rather a question if '(&a.x + 1) + 1 == &a.y' is UB – tstanisl Sep 05 '21 at 22:23
  • It's undefined by out of bound array access -- in the pointer arithmetic definition, a non-array object behaves like an array of size 1, so you cannot access the second element – M.M Sep 05 '21 at 22:24
  • @tstanisl in your latest comment the second `+1` is UB for pointer arithmetic out of bounds of object – M.M Sep 05 '21 at 22:25
  • @EricPostpischil, 6.5.9p6 mentions "address space". I guess it means that the standard assumes that both `a.x` and `a.y` are "backed" by some contiguous plurality of bytes. Does it mean that the value of `a.y` can be safely accessed via `memcpy(&a.x+1, ...)`? – tstanisl Sep 06 '21 at 06:20
  • @M.M, yes, you're right of course. I'm just amazed how pointers' equality has little in common with pointers' equivalence. It begs to be able to substitute `&a.x+1` with `&a.y` making `&a.y + 1 == &a.y` which would be false if there was no UB – tstanisl Sep 06 '21 at 06:47
  • I'll accept the answer though I'm not fully convinced to the second conclusion. Arguments from @Lundin support the claim that `memcpy(&a.x+1,...)` can reliably modify `&a.y`. The idea that two correctly constructed, correctly typed, and equal pointers cannot be used to access the same object while `memcpy` can, feels very wrong to me. However, I have no good arguments to support it. It makes pointers' comparison against something different than `NULL` would quite pointless and unreliable. Maybe there is something hidden between the words in the standard? – tstanisl Sep 07 '21 at 07:14
  • 1
    @tstanisl: For a real-life example, see https://godbolt.org/z/3dssjxKzf. If `x` has the value 1 then `(&loc.a)[x] = 42;` "should" modify `loc.b`. However clang assumes it cannot do so; it optimizes out the store and returns `g.b` unconditionally. AFAIK this is perfectly legal for it to do because of the UB as mentioned. (Oddly, changing the code to `*(&loc.a + x) = 42;` makes it actually do the store and reload again, even though the two statements are supposed to be equivalent.) – Nate Eldredge Sep 21 '21 at 13:45
  • @NateEldredge, thanks for example, I was not able to trigger that with GCC. So it is indeed UB. Surprisingly for `(&loc.a)[1] = 42;` it simply returns 42. – tstanisl Sep 21 '21 at 13:54
1

May be related to your example: ISO/IEC 9899:2018 (C17) at 6.7.2.1#21 talks about accessing an unspecified-size array at the end of a struct:

struct s {
  int n;
  double d[];
};

// given this assumption:
assert( sizeof (struct s) >= offsetof(struct s, d) + sizeof (double) );

// the standard says that the following code may be legitimate and might not be UB,
// but it is NOT a strictly conforming code:
struct s t1;
t1.d[0] = 4.2;

The previous example is very similar to yours. So my guess is: what you want to do may not be UB (given you used offsetof to check etc.), but your code is not strictly conforming.

Luca Polito
  • 2,387
  • 14
  • 20
  • This is not really correct - all the example 2 (informative) shows is that a struct with a flexible array member may have trailing padding beyond what sizeof gave as result. The normative text for flexible array members is 6.7.2.1/18. "In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply." The example writes to the supposed trailing padding. And it's a special case which isn't at all relevant to the OP's code. – Lundin Sep 06 '21 at 10:00
  • @Lundin I think it's a similar situation, kind of. In both the code snippets, `offsetof` is used to be sure that there is no out-of-bounds/not-aligned access (OP: `(&a.x)[1]`, C17 example: `t1.d[0]`). And they're both edge-case situations where the code *should* work without problems in practice, but it doesn't seem correct in theory. And I think the C17 standard synthesizes this situation quite well: **the code may be legitimate and might not be UB, but it is not strictly conforming**. Maybe you're right, and the C17 standard response is not applicable at all to OP's situation. – Luca Polito Sep 06 '21 at 10:22
  • There are no flexible array members in the OP's example. Array access notation isn't the same as declaring something a flexible array member. – Lundin Sep 06 '21 at 10:45
  • @Lundin My answer isn't about flexible array members. As I already said, my answer is about a *similar situation* (involving different aspects of the language, in this case flexible array members) that normally invokes UB, but may be legitimate. And I think the C17 response to that situation is applicable to OP's code too (even if OP's code is not about flexible array members, but still: OP's code is similar because it usually invokes UB, but the `offsetof` assert makes it legitimate in practice). – Luca Polito Sep 06 '21 at 10:52
  • The reason the example you quote may work is because of the special rule in 6.7.2.1/18 (which I quoted) for flexible array members. It isn't applicable elsewhere - flexible array members were added to the standard to explicitly fix the old "struct hack", which was an UB way of accessing a struct beyond the end of it. – Lundin Sep 06 '21 at 11:00
  • @Lundin I think that the rule in 6.7.2.1#18 is not related to the example given in #21. Instead, 6.7.2.1#21 says (talking about `t1.d[0]`): `The assignment [...] is probably undefined behavior, but it is possible that [...offsetoff check...], in which case the assignment would be legitimate. Nevertheless, it cannot appear in strictly conforming code`. However, the examples in #23 (and also #24) are related to the special rule 6.7.2.1#18, and in fact they are not described as UB (except for the last `*dp = 42;`, but that's a violation of #18). The example in #21 is however treated differently. – Luca Polito Sep 06 '21 at 11:42
  • Examples are not normative, they are informative, clarifying previously written normative text. Example 2 is about flexible array members, for which normative text can only be found in 6.7.2.1/18. – Lundin Sep 06 '21 at 11:47