21

Specifically, is the following code, the line below the marker, OK?

struct S{
    int a;
};

#include <stdlib.h>

int main(){
    struct S *p;
    p = malloc(sizeof(struct S) + 1000);
    // This line:
    *(&(p->a) + 1) = 0;
}

People have argued here, but no one has given a convincing explanation or reference.

Their arguments are on a slightly different base, yet essentially the same

typedef struct _pack{
    int64_t c;
} pack;

int main(){
    pack *p;
    char str[9] = "aaaaaaaa"; // Input
    size_t len = offsetof(pack, c) + (strlen(str) + 1);
    p = malloc(len);
    // This line, with similar intention:
    strcpy((char*)&(p->c), str);
//                ^^^^^^^
iBug
  • 35,554
  • 7
  • 89
  • 134
  • 1
    This is ok in fact. – unalignedmemoryaccess Nov 10 '17 at 13:44
  • @tilz0R Go to the chat room in the post and read the messages there. You'll soon be confused :) – iBug Nov 10 '17 at 13:48
  • I'm not confused, your particular example is perfectly correct and allowed. Also, check answer below. – unalignedmemoryaccess Nov 10 '17 at 13:48
  • @BoPersson You assume that `p` should be used as an array of structs which is not the case here. It's only about one single object allocated with a size larger than the size of the bare struct. – Gerhardh Nov 10 '17 at 13:57
  • @BoPersson I think even with padding, `malloc(1000)` should be large enough. – iBug Nov 10 '17 at 13:59
  • 2
    So you are not asking about accessing the second element of an array, but out-of bounds accessing an object in general. And with the wrong type. That's totally undefined. – Bo Persson Nov 10 '17 at 14:04
  • 1
    I think you don't really grash what undefined behaviour means (looking at your comments & another related question of yours). No what matter how many "what ifs" or additional guarantees you provide, "but this is undefined behaviour" still applies and we're back to square one. – P.P Nov 10 '17 at 14:20
  • @P.P. Good. Your words *UB is not platform specific* really impressed me. – iBug Nov 10 '17 at 14:27
  • Doesn't the answer by @M.M in the linked question address this in excruciating detail? – Barmar Nov 14 '17 at 18:36
  • @iBug I think you must have worked for my company, and code like this has kept me fixing your bugs for 32 years. It's not about whether you can, but why do you want to? If you want to access data, make a data structure. If an existing data structure doesn't cut it, make a new one. – Sinc Nov 21 '17 at 16:17

3 Answers3

24

The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.

The member p->a is an object of type int. C11 6.5.6p7 says that

7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Thus

&(p->a)

is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.

Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:

Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).

In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.

behavior [...], for which [The C11] Standard imposes no requirements

With the note saying that:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.


There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].

Question 16

[...]

Response

For an array of arrays, the permitted pointer arithmetic in subclause 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use of the word object as denoting the specific object determined directly by the pointer's type and value, not other objects related to that one by contiguity. Therefore, if an expression exceeds these permissions, the behavior is undefined. For example, the following code has undefined behavior:

 int a[4][5];

 a[1][7] = 0; /* undefined */ 

Some conforming implementations may choose to diagnose an array bounds violation, while others may choose to interpret such attempted accesses successfully with the obvious extended semantics.

(bolded emphasis mine)

There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.

If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:

*(int *)((S *)&(p->a) + 1) = 0;
  • What if you have `struct S { int a[1]; int b; }` on a platform where `&s.b == s.a+1`. Would you still consider a write into `*(s.a+1)` to be UB? I think the example with the malloc'd memory is basically the same: `s->a+1` is very likely to be a valid address for a neighboring `int`. – Petr Skocik Nov 10 '17 at 14:19
  • 8
    @PSkocik UB is not platform specific ;-) It might work on some platforms doesn't change it being UB; it's just outside of what the standard can guarantee. – P.P Nov 10 '17 at 14:22
  • 3
    @PSkocik UB does not depend. Some implementations of a behavior works as expected doesn't mean it's well defined by the standard. It's free for a compiler to decide whether to do as you'd expect or not for a UB. – iBug Nov 10 '17 at 14:25
  • "Very likely" is not "certain". An array might have padding (or, possibly, a sentinel / trap value) at the end. If you rely on platform specifics, fine, but you lose any guarantee of it working on any other platform. – cHao Nov 10 '17 at 15:19
  • `struct s { size_t len; char text[0]; };` is a common pattern for variable-length strings. – Simon Richter Nov 10 '17 at 15:28
  • 2
    @SimonRichter yes, and it doesn't comply to **any** C standard, the array of size 0 being a **constraint error**, so it is definitely undefined behaviour and a compliant compiler **must not** compile it without complaining. – Antti Haapala -- Слава Україні Nov 10 '17 at 15:29
  • 1
    @SimonRichter -- and that is why flexible array members were added to the language. – ad absurdum Nov 10 '17 at 15:35
  • @PSkocik: You are thinking of pointers as if they were addresses in memory. [They are not.](https://stackoverflow.com/questions/11713929/c-c-pointer-arithmetic/11714314#11714314) The compiler does not simply assign addresses to objects and manipulate the addresses using ordinary arithmetic. Some processors have exotic addressing schemes, so ordinary arithmetic does not work on pointers. And all good compilers have optimizers, and optimizers make a variety of program transformations based on the rules of C. Those transformation can break pointer arithmetic that does not conform to the C rules. – Eric Postpischil Nov 10 '17 at 15:49
  • There is this too: "Value computation for an lvalue expression includes determining the identity of the designated object." – Antti Haapala -- Слава Україні Nov 10 '17 at 15:52
  • @AnttiHaapala - With all due respect, this memory has no effective type yet. This is the *first* access with an lvalue of type `int`. So what is the issue again? I'm referring to the first example. – StoryTeller - Unslander Monica Nov 10 '17 at 16:07
  • @StoryTeller but that lvalue of type `int` has the effective type of `int`. – Antti Haapala -- Слава Україні Nov 10 '17 at 16:23
  • I am not talking about effective types here. – Antti Haapala -- Слава Україні Nov 10 '17 at 16:23
  • @AnttiHaapala - But you should talk about. These waters are muddy, and this point **needs** reconciliation. – StoryTeller - Unslander Monica Nov 10 '17 at 16:27
  • @EricPostpischil OK, I get that it's "technically" undefined and that pointer resolution might hide stuff that's more complex than accessing memory as if it were a byte array, but I still, I think that given other constraints at play here (mainly contiguity of the space returned by malloc and the ability to address all objects by `char` pointers), no reasonable, conforming compiler on any C platform could mess this particular example up (as long as you can `_Static_assert(sizeof(int)==sizeof(struct S),"")`). – Petr Skocik Nov 10 '17 at 16:45
  • @PSkocik there is no such thing as "technically" undefined. The standard specifically says that it is undefined, then it is undefined. It is up to you to ensure that your compiler has said somewhere that it has implemented this. – Antti Haapala -- Слава Україні Nov 10 '17 at 18:10
  • @PSkocik: One of the things optimizers do is that, when, at a particular point in the code, there are undefined cases and defined cases, they assume only the defined cases will ever be executed, and they simplify based on that. This is sound and useful reasoning, as it enables the optimizer to discard cases that a good designer never intended to be executed. With a constant `1`, current optimizers might recognize it is a forced use outside of the C requirements and leave it alone. But suppose a routine were passed `int n` as a parameter, and the code were `*(&(p->a) + n)`. Then… – Eric Postpischil Nov 10 '17 at 19:06
  • 2
    … the optimizer might see that this expression is valid (given the definition of the `struct`) only when `n` is zero, so it can optimize the code to serve that case only. The result would be that when the routine is called with `n` set to one, the generated code behaves as if `n` were zero. – Eric Postpischil Nov 10 '17 at 19:09
  • @iBug I found this too: [the answer to a defect report to C89](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_017.html) – Antti Haapala -- Слава Україні Nov 11 '17 at 05:00
8

This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.

The correct way to achieve what you want, is to use an array without a size as the last struct member:

#include <stdlib.h>

typedef struct S {
    int foo;    //avoid flexible array being the only member
    int a[];
} S;

int main(){
    S *p = malloc(sizeof(*p) + 2*sizeof(int));
    p->a[0] = 0;
    p->a[1] = 42;    //Perfectly legal.
}
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • But isn't `*(&(p->a)+1)` within the bounds of allocated memory? – iBug Nov 10 '17 at 13:56
  • 5
    @iBug That something is within an allocated memory region does not mean, accessing it is defined behavior. There's tons of undefined behavior that only ever access allocated memory (strict aliasing rules etc). – cmaster - reinstate monica Nov 10 '17 at 13:58
  • So at last, an operation is well-defined *only if* all of its aspects are well-defined? – iBug Nov 10 '17 at 14:20
  • 1
    @iBug The C/C++ standards know different levels of undefinedness, most importantly, an uninitialized value does not immediately trigger undefined behavior (at least as far as I remember, this might also have changed throughout the standard revisions), and there's also implementation defined behavior. However, once you hit the undefined behavior level in any aspect of what you are doing, it's game over. Your compiler is then allowed to emit code to format your hard-drive instead. – cmaster - reinstate monica Nov 10 '17 at 14:28
  • @cmaster read initialized value is UB, if value can have trap representation, https://stackoverflow.com/a/11965368/7076153. – Stargateur Nov 10 '17 at 17:48
  • @Stargateur True, if it can have a trap representation, it's UB. No doubt about that. However, if the value cannot have a trap representation, I think it's not immediately UB. The question that remains for me is, whether it's sufficient for the implementation to define a type trap-free, or whether the type must be defined trap-free by the standard. Unfortunately, I don't know which types are actually defined to be trap-free by the standard; I only know that modern CPUs like the X86-64 architecture do not use trap representations for integers and pointers. – cmaster - reinstate monica Nov 10 '17 at 19:08
  • @cmaster the "trap representation" means that the value is invalid. For example bytes in memory that constitute a pointer value that is not properly aligned for the type. – Antti Haapala -- Слава Україні Nov 11 '17 at 06:29
  • @cmaster also, reading the value of an uninitialized automatic variable whose address was never taken has undefined behaviour, no matter the type. – Antti Haapala -- Слава Україні Nov 11 '17 at 06:30
1

C standard guarantees that
§6.7.2.1/15:

[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • 4
    `&(p->a)` is equivalent to `(int*)p`. – Daniel Fischer Nov 10 '17 at 14:54
  • @DanielFischer; Agree. `sizeof(struct S` == `sizeof(int)` in this particular case. – haccks Nov 10 '17 at 14:55
  • 2
    The address of struct's 1st member is guaranteed by the C Standard to be the same as the struct's address. – alk Nov 10 '17 at 14:57
  • Usually. But DS 9000 adds padding to all structs just to trap people. – Daniel Fischer Nov 10 '17 at 14:57
  • 1
    @DanielFischer; Not sure about DS 9000. – haccks Nov 10 '17 at 14:58
  • 2
    @DanielFischer -- [The address of `struct`'s 1st member is _guaranteed_ by the C Standard to be the same as the `struct`'s address.](https://stackoverflow.com/questions/47224138/is-it-ok-to-access-past-the-size-of-a-structure-via-member-address-with-enough#comment81401081_47224798) [Always](http://port70.net/~nsz/c/c11/n1570.html#6.7.2.1p15). – ad absurdum Nov 10 '17 at 15:04
  • @DavidBowling Yes. But `&(p->a)` is an `int*`, not a `struct S*`. – Daniel Fischer Nov 10 '17 at 15:06
  • 1
    @DanielFischer; You missed **"suitably converted"** then. – haccks Nov 10 '17 at 15:07
  • @DanielFischer -- I will agree with you that `&(p->a)` is not a pointer to a `struct`, but a pointer to `int`. But, there can never be padding at the beginning of a `struct`. – ad absurdum Nov 10 '17 at 15:09
  • 2
    @haccks No, I didn't miss that. The "suitably converted" here means that `(int*)p` is a pointer to the first (only) member of the struct, and `(struct S*)&(p->a)` is a pointer to the struct. – Daniel Fischer Nov 10 '17 at 15:11
  • 1
    @DavidBowling Yes, padding at the start of a struct is forbidden. But an implementation is free to add padding at the end (or, if there is more than one member, between members) as it sees fit. So `sizeof (struct S) == sizeof (int)` is not guaranteed by the standard. It will be true in all sensible implementations, but that's coincidental. – Daniel Fischer Nov 10 '17 at 15:13
  • @DanielFischer -- sure. FWIW, I think that this pointer issue is the key point; that `&(p->a)` is a pointer to `int`, and that OP code has undefined behavior [for the reasons given by Antti](https://stackoverflow.com/a/47224596/6879826). – ad absurdum Nov 10 '17 at 15:17
  • @DanielFischer; Is there any relevant section of standard where it says that padding can be added at the last even if struct contains a single (`int`) element? – haccks Nov 10 '17 at 15:19
  • Sounds like arguing about whether "*is PHP the best language in the world?*", lol. – iBug Nov 10 '17 at 15:24
  • 2
    6.7.2.1, point 15, last sentence: "There may be unnamed padding within a structure object, but not at its beginning." And point 17: "There may be unnamed padding at the end of a structure or union." – Daniel Fischer Nov 10 '17 at 15:28
  • 5
    @iBug: It seems you are not getting the point. You tagged *your* question "*language-lawyer*", now you get language lawyers. – alk Nov 10 '17 at 15:38
  • 1
    @alk Fine. I thought I was just joking. – iBug Nov 10 '17 at 15:41