Is adding to a "char *" pointer UB, when it doesn't actually point to a char array?

Question

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n; otherwise, the behavior is undefined.

struct Foo {
    float x, y, z;
};

Foo f;
char *p = reinterpret_cast<char*>(&f) + offsetof(Foo, z); // (*)
*reinterpret_cast<float*>(p) = 42.0f;

Has the line marked with (*) UB? reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, so it should UB according to the cited paragraph. But, if it is UB, then offsetof's usefulness would be limited.

Is it UB? If not, why not?

[\[basic.lval\]/8](https://timsong-cpp.github.io/cppwp/n4659/basic.lval#8) — StoryTeller - Unslander Monica, Nov 26 '17 at 16:48
@StoryTeller: line (*) doesn't access it, it is just a pointer manipulation. — geza, Nov 26 '17 at 16:50
It isn't UB.. You are only taking the address of a variable and casting it to a `char*` then back to its original type. It points to a valid object (address of `z`). — Brandon, Nov 26 '17 at 16:52
You alias `f` with `p`, that's allowed already. The storage of the object can be viewed as specified in [\[intro.object\]](https://timsong-cpp.github.io/cppwp/n4659/intro.object#3) (an array of characters or `std::byte`). So what's the problem? — StoryTeller - Unslander Monica, Nov 26 '17 at 16:56
@StoryTeller [intro.object] describes objects that are actually being created in arrays of bytes. It doesn't discuss objects not created in arrays of bytes. Note the first word there: "***If*** a complete object is created [...]" — , Nov 26 '17 at 16:59
@StoryTeller: hmm, that paragraph is about placement new, isn't it? Why is it relevant here? — geza, Nov 26 '17 at 16:59
@geza - It's not, the link was supposed to be to the entire section. It's an editorial mistake. — StoryTeller - Unslander Monica, Nov 26 '17 at 17:01
@StoryTeller In my comment I wrongly wrote [intro.object] as well when I only meant to comment on the bit you linked to, but the rest of [intro.object] doesn't say the complete thing can be viewed as an array either. It does say "An object of trivially copyable or standard-layout type shall occupy contiguous bytes of storage" but contiguous bytes of storage does not imply array. (`x`, `y` and `z` also occupy contiguous bytes of storage but I don't think anyone here is saying they may be accessed as a `float[3]`.) — , Nov 26 '17 at 17:07
@hvd - It says quite plainly *"Unless it is a bit-field, a most derived object shall have a nonzero size and shall occupy one or more bytes of storage"*. Now, I take that to mean an array of `std::byte` of some size. You can argue semantics with me, or you can help me find the proposal for `std::byte` that lists **all** the relevant parts of accessing raw storage. I'm having trouble at the moment. — StoryTeller - Unslander Monica, Nov 26 '17 at 17:08
@StoryTeller I covered that already. Taking that to mean an array is bogus. — , Nov 26 '17 at 17:10
@hvd - That stance is ludicrous even in a language lawyer debate — StoryTeller - Unslander Monica, Nov 26 '17 at 17:11
@StoryTeller I gave a *very* specific counter-example of contiguous objects that clearly cannot be taken as an array. Or am I wrong and are you saying they can be? — , Nov 26 '17 at 17:12
@geza - The proposal for [`std::byte`](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0298r1.pdf) conveniently highlights all the places where `std::byte` had to be added to "character types" in order to make any such access well-defined. I hope this will assure you. — StoryTeller - Unslander Monica, Nov 26 '17 at 17:13
@StoryTeller This question isn't about the access, it's about the addition. And the proposal you link to doesn't appear to make any change wrt the addition. — , Nov 26 '17 at 17:15
The structure of the rule is *"If the expression P points to element x[i] of an array object x with n elements ... otherwise, the behavior is undefined."*. It is unclear to me whether *otherwise* refers to inner restrictions of the array case (out of bounds), or to all cases where the pointed object isn't an element of an array. In the latter interpretation, `offsetof` becomes seemingly useless. — eerorika, Nov 26 '17 at 17:22
@user2079303: you've linked the same pararaph that I did :) Yes, that's basically the question. The current interpretation is the latter, at least this is what I deduced from answers here on SO. For example, doing `&x+2` is not allowed, even though it arithmetically could be equal to `&z`, if there's no padding. — geza, Nov 26 '17 at 17:27

score 6 · Answer 1 · answered Nov 26 '17 at 17:37

6

The addition is intended to be valid, but I do not believe the standard manages to say so clearly enough. Quoting N4140 (roughly C++14):

3.9 Types [basic.types]

2 For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.⁴² [...]

^{42) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.}

It says "for example" because std::memcpy and std::memmove are not the only ways in which the underlying bytes are intended to be allowed to be copied. A simple for loop which copies byte by byte manually is supposed to be valid as well.

In order for that to work, addition has to be defined for pointers to the raw bytes that make up an object, and the way definedness of expressions works, the addition's definedness cannot depend on whether the addition's result will subsequently be used to copy the bytes into an array.

Whether that means those bytes form an array already or whether this is a special exception to the general rules for the + operator that is somehow omitted in the operator description, is not clear to me (I suspect the former), but either way would make the addition you're performing in your code valid.

answered Nov 26 '17 at 17:37

Your wording seems to imply it is valid even under the current wording, although your comments seems to say otherwise? Can you clarify? – Passer By Dec 15 '17 at 13:05
@PasserBy Do you mean my comments on the question, in response to StoryTeller? In those, I'm saying I believe StoryTeller's logic is flawed, but that says nothing about the conclusion. Flawed logic can lead to a correct conclusion just as well as an incorrect one. – Dec 15 '17 at 13:35
I think the other bit from \[basic.types], the one that talks about *object storage*, is more relevant as that talks about all objects and defines the object storage as sequence of unsigned chars. That's closest to saying any object can be treated as an (unsigned) char array the specification gets. – Jan Hudec May 29 '18 at 15:11
@JanHudec That implies more, but I think it guarantees less: an array is a sequence of objects, but there isn't anything that says that all sequences of objects are arrays, is there? – May 29 '18 at 16:13
IMHO, the only really consistent way to resolve issues like this is to recognize that neither C89 nor its successors attempt to describe everything an implementation must do to be suitable for any particular purpose, and thus--possibly to "save ink"--they don't always bother to define the behavior of actions that should obviously (at least by the standards of the day) be handled usefully by quality implementations intended for various targets and purposes, or in some cases, by essentially all non-garbage implementations. – supercat Aug 29 '18 at 20:26
I could see some merit to making `offsetof` optional, with a proviso that implementations that don't define it would not be required to allow code to form meaningful character pointers to the "interiors" of objects nor use functions like `memcpy` and `memmove` to copy partial objects. Such implementations would of course be unsuitable for many kinds of low-level programming, but could be suitable for some purposes that a low-level implementation might not. For example, such an implementation could be designed to trap any and all attempts to use pointers that were illegitimately derived. – supercat Aug 29 '18 at 20:33
An implementation that's going to define `offsetof`, however, should define the behavior of converting an object's address to `char*` and indexing it, in such a fashion as to make structure member offsets useful. If `offsetof` isn't going to be useful for such purposes, it may as well not exist. – supercat Aug 29 '18 at 20:35

zwol · Answer 2 · 2018-08-30T12:11:03.937

6

Any interpretation that disallows the intended usage of offsetof must be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a, b, c; };

const size_t idx_S[] = {
    offsetof(struct S, a),
    offsetof(struct S, b),
    offsetof(struct S, c),
};

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 3);
    return *(float *)(((char *)sp) + idx_S[idx]); // intended to be valid
}

However, any interpretation that allows one to step past the end of an explicitly-declared array must also be wrong:

#include <assert.h>
#include <stddef.h>
struct S { float a[2]; float b[2]; };

static_assert(offsetof(struct S, b) == sizeof(float)*2,
    "padding between S.a and S.b -- should be impossible");

float read_S(struct S *sp, unsigned int idx)
{
    assert(idx < 4);
    return sp->a[idx]; // undefined behavior if idx >= 2,
                       // reading past end of array
}

And we are now on the horns of a dilemma, because the wording in both the C and C++ standards, that was intended to disallow the second case, probably also disallows the first case.

This is commonly known as the "what is an object?" problem. People, including members of the C and C++ committees, have been arguing about this and related issues since the 1990s, and there have been multiple attempts to fix the wording, and to the best of my knowledge none has succeeded (in the sense that all existing "reasonable" code is rendered definitely conforming and all existing "reasonable" optimizations are still allowed).

(Note: All of the above code is written as it would be written in C to emphasize that the same problem exists in both languages, and can be encountered without the use of any C++ constructs.)

edited Aug 30 '18 at 12:11

answered Nov 26 '17 at 17:44

zwol

135,547
38
252
361

I believe the wording intended to disallow the second case does, in fact, *not* disallow the first case, though it definitely could be clearer. In the first case, you take the memory occupied by S and cast it to effectively `unsigned char[sizeof(S)]`. That the “object storage” is such array is defined in `[basic.types]` paragraph 4. Therefore, you are within a char array and the arithmetic is well defined. – Jan Hudec May 29 '18 at 13:00
@JanHudec A lot of people agree with you. About the same number of people disagree with you. – zwol May 29 '18 at 14:07
Why is `*(float *)(((char *)sp) + idx_S[idx])` intended to be valid? You can convert a pointer to `uintptr_t` and doing arithmetic, then convert it back to a pointer. This is well-defined behavior (though implementation-defined). I think `offsetof` is intended to be used in this way. – xskxzr Aug 28 '18 at 09:48
@xskxzr `uintptr_t` was added to the C and C++ standards long after `offsetof`; `offsetof` must have been intended to be useful in the context of C89. We're all doing archaeology together at this point -- by now, even the original authors of C89 probably don't remember exactly what their intent was -- but code like the `idx_S` example I showed, using `char *`, but doing the guts of `offsetof` by hand, appears in dozens of programs written in the 1980s, so we can be pretty confident that the C committee had that in mind when they invented `offsetof`. – zwol Aug 28 '18 at 14:46
@xskxzr: There is no guarantee that, given `char *p; size_t x;`, the value of `(char*)(((uintptr_t)p)+x)` bear any relationship whatsoever to `p+x`, even in cases where both expressions would yield defined values. Further, there are proposals to allow conforming compilers to track provenance of pointers even when they are converted through `uintptr_t`, which some compiler writers are almost certain to interpret as saying... – supercat Aug 29 '18 at 20:00
...that such behavior should be considered appropriate in all kinds of compilers, and that any code which is incompatible with such treatment should be considered "broken". – supercat Aug 29 '18 at 20:01
1

A quick note about C and `offsetof`: Per C if "a null pointer is guaranteed to compare unequal to a pointer to any object" and "any two null pointers shall compare equal", then a null pointer is _not_ a pointer to object. Hence, "a postfix expression followed by the -> operator and an identifier" does _not_ designate a member of a structure object. Hence, [`((st *)0)->m`](https://en.wikipedia.org/wiki/Offsetof) violates the semantics of the `->` operator. – pmor Feb 22 '22 at 18:15
1

@pmor Yes, however that is only an argument against defining offsetof yourself. If the programmer includes `stddef.h` they are entitled to treat offsetof as a black box that behaves as described in 7.19p3. (Implementations have moved away from the traditional definition that uses the construct you mention, but if an implementation does use that definition, then either it provides the correct semantics _on that implementation_, or the _implementation_ is buggy. This is no different from, say, implementation-defined constructs buried in the expansion of `getc`.) – zwol Feb 23 '22 at 02:17
@zwol Re: "as a black box": as a _tested_ black box. If an implementation provides the `offsetof`, then it was (very likely) tested. The same approach can be applied to, for example, `dlsym`: if an implementation provides the `dlsym` then it was (very likely) tested. Hence, the `dlsym` _can_ be used. – pmor Feb 25 '22 at 20:46

score 1 · Answer 3 · answered Nov 26 '17 at 17:39

As far as I know, your code is valid. Aliasing an object as a char array is explicitly allowed as per § 3.10 ¶ 10.8:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

[…]

a char or unsigned char type.

The other question is whether casting the char* pointer back to float* and assigning through it is valid. Since your Foo is a POD type, this is okay. You are allowed to compute the address of a POD's member (given that the computation itself is not UB) and then access the member through that address. You must not abuse this to, for example, gain access to a private member of a non-POD object. Furthermore, it would be UB if you'd, say, cast to int* or write at an address where no object of type float exists. The reasoning behind this can be found in the section quoted above.

They just say `char`, not `char` array. That's make the difference. — geza, Nov 26 '17 at 18:06
`char[N]` and its subobjects are different. Just as you wouldn't expect `struct { char a0, a1; }` to be able to alias `int16_t` even if it doesn't have any padding. — Passer By, Dec 15 '17 at 12:58

xskxzr · Answer 4 · 2018-08-29T05:25:35.017

1

Yes, this is undefined. As you have stated in your question,

reinterpret_cast<char*>(&f) doesn't point to a char array, but to a float, ...

... reinterpret_cast<char*>(&f) does even not point to a char, so even if the object representation is a char array, the behavior is still undefined.

For offsetof, you can still use it like

struct Foo {
    float x, y, z;
};

Foo f;
auto p = reinterpret_cast<std::uintptr_t>(&f) + offsetof(Foo, z); 
                       // ^^^^^^^^^^^^^^
*reinterpret_cast<float*>(p) = 42.0f;

edited Aug 29 '18 at 05:25

answered Aug 29 '18 at 03:58

xskxzr

12,442
12
37
77

Yes, but strictly speaking, this may not be a solution, right? I mean, pointer<->integer conversion is implementation defined, so it may not do what I want (but of course, all implementations I know does the right thing). – geza Aug 29 '18 at 07:12
@geza [expr.reinterpret.cast]/4 says "it is intended to be unsurprising to those who know the addressing structure of the underlying machine", so I think such pattern is reliable. In addition, it is used in the recent document [P0908](http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0908r0.html). – xskxzr Aug 29 '18 at 11:09
_For offsetof, you can still use it_ Or can not... [When casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined](https://gcc.gnu.org/onlinedocs/gcc/Arrays-and-pointers-implementation.html) – Language Lawyer Apr 13 '19 at 03:07

score 1 · Accepted Answer · answered Apr 08 '19 at 16:53

See CWG 1314

According to 6.9 [basic.types] paragraph 4,

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).

and 4.5 [intro.object] paragraph 5,

An object of trivially copyable or standard-layout type (6.9 [basic.types]) shall occupy contiguous bytes of storage.

Do these passages make pointer arithmetic (8.7 [expr.add] paragraph 5) within a standard-layout object well-defined (e.g., for writing one's own version of memcpy?

Rationale (August, 2011):

The current wording is sufficiently clear that this usage is permitted.

I strongly disagree with CWG's statement that "the current wording is sufficiently clear", but nevertheless, that's the ruling we have.

I interpret CWG's response as suggesting that a pointer to unsigned char into an object of trivially copyable or standard-layout type, for the purposes of pointer arithmetic, ought to be interpreted as a pointer to an array of unsigned char whose size equals the size of the object in question. I don't know whether they intended that it would also work using a char pointer or (as of C++17) a std::byte pointer. (Maybe if they had decided to actually clarify it instead of claiming the existing wording was clear enough, then I would know the answer.)

(A separate issue is whether std::launder is required to make the OP's code well-defined. I won't go into this here; I think it deserves a separate question.)

For future readers: here's the question about `std::launder`: https://stackoverflow.com/questions/55578429/do-we-need-to-use-stdlaunder-when-doing-pointer-arithmetic-within-a-standard-l — geza, Apr 08 '19 at 19:46

Is adding to a "char *" pointer UB, when it doesn't actually point to a char array?

5 Answers5

Linked

Related