34

Suppose I want to get the last element of an automatic array whose size is unknown. I know that I can make use of the sizeof operator to get the size of the array and get the last element accordingly.

Is using *((*(&array + 1)) - 1) safe?

Like:

char array[SOME_SIZE] = { ... };
printf("Last element = %c", *((*(&array + 1)) - 1));
int array[SOME_SIZE] = { ... };
printf("Last element = %d", *((*(&array + 1)) - 1));

etc

Spikatrix
  • 20,225
  • 7
  • 37
  • 83
  • 1
    Let's simplify it at first. `(*(&array+1))` is equal to `array[1]` –  Sep 12 '15 at 09:57
  • 1
    We're left with `*(array[1] - 1)`. That makes no sense. –  Sep 12 '15 at 09:58
  • 4
    I don't think so. `*(array+1)` is equal to `array[1]`. – Spikatrix Sep 12 '15 at 09:59
  • Oh, right, my bad. Sorry. I forgot it was an array. –  Sep 12 '15 at 09:59
  • 1
    It's not safe if you can't ensure `SOME_SIZE>0` – mvds Sep 12 '15 at 10:01
  • 1
    @mvds: And where in the standards it says that's something valid at all? – 3442 Sep 12 '15 at 10:01
  • Wont it have a risk of accessing an unknown memory ? means derefeerenceing `*(&array+1)`. – ameyCU Sep 12 '15 at 10:03
  • 1
    @KemyLand it compiles, and people do this: http://stackoverflow.com/questions/9722632/what-happens-if-i-define-a-0-size-array-in-c-c, so who cares what the standard says? It's not safe. – mvds Sep 12 '15 at 10:03
  • 7
    I think `&array+1` is still OK but you're hitting UB at `*(&array+1)` because that's dereferencing a pointer one past the end of the array variable (but I'm not sure about this). – melpomene Sep 12 '15 at 10:04
  • I suppose `&array + 1` is equal to `arrayofarrays[1]`. –  Sep 12 '15 at 10:04
  • 4
    In bracket notation this is `(&array)[1][-1]`. So (as a quick assessment) I don't think it is valid - the past-the-end pointer is dereferenced. – M.M Sep 12 '15 at 10:05
  • 1
    @Xis88 I don't think they are same . – ameyCU Sep 12 '15 at 10:05
  • @melpomene What about `malloc(sizeof(*ptr))` ? –  Sep 12 '15 at 10:07
  • 5
    @Xis88 Safe because `sizeof` (mostly) doesn't evaluate its operand, it only looks at its type (exception: VLAs). – melpomene Sep 12 '15 at 10:08
  • @CoolGuy Please take a look at my answer . – ameyCU Sep 12 '15 at 10:22
  • Interesting, but because C++ rather than pure C is almost ubiquitous nowadays, I'd use `template T&back(T (&arr)[N]) { static_assert(N>0, "Empty"); return arr[N-1]; }` – nodakai Sep 12 '15 at 11:38
  • 1
    I think `((T *)(&array + 1))[-1]` would be correct , where `T` is the element type – M.M Sep 12 '15 at 11:48
  • 1
    i do not understand why write code that is difficult to read. Thought about maintainability – Ed Heal Sep 12 '15 at 12:01
  • Most answers here seem to be missing 6.5.6.7, which I think is pretty important in this discussion. See my answer for further details. – Filipe Gonçalves Sep 12 '15 at 13:57
  • 10
    I don't know if it's legal or not, but I do know that code maintainers *will* kill you because of this code. – milleniumbug Sep 12 '15 at 14:03
  • 1
    @mvds: If `SOME_SIZE <= 0`, then the array declaration itself is already an error. –  Sep 12 '15 at 20:14
  • @M.M Is `(T *)(&array + 1)` guaranteed to be the same as `*(&array + 1)`? – Spikatrix Sep 13 '15 at 06:25
  • @CoolGuy no, since the latter is UB – M.M Sep 13 '15 at 09:06
  • @CoolGuy, no since both are undefined behavior. – David Hammen Sep 13 '15 at 09:31
  • @DavidHammen Ok. Is `(int*)(&array)` the same as `&array[0]`? – Spikatrix Sep 13 '15 at 09:37
  • @CoolGuy - `&array` is if type `int*[SOME_SIZE]`, so casting that to `int*` doesn't make sense. `&array[0]` is of type `int*`, so casting that to `int*` is a no-op. – David Hammen Sep 13 '15 at 12:37
  • @Hurkyl not an error according to gcc. You can define SOME_SIZE to 0 and only get a warning that "zero size arrays are an extension". So, answering the question: the proposed construct is not safe if you don't guarantee that SOME_SIZE>0. – mvds Sep 13 '15 at 14:24

6 Answers6

21

No, it is not.

&array is of type pointer to char[SOME_SIZE] (in the first example given). This means &array + 1 points to memory immediately past the end of array. Dereferencing that (as in (*(&array+1)) gives undefined behaviour.

No need to analyse further. Once there is any part of an expression that gives undefined behaviour, the whole expression does.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • 2
    But `array[sizeof(array)/sizeof(array[0])-1]` would be safe, right? –  Sep 12 '15 at 10:13
  • 10
    Sure, but that has nothing to do with the question. – Peter Sep 12 '15 at 10:14
  • 2
    It's related because it does the same thing safely. Sorry for my curiousity. –  Sep 12 '15 at 10:14
  • 1
    No it doesn't actually. It evaluates the value of the last element of `array`. That is not what the question is about. – Peter Sep 12 '15 at 10:16
  • 1
    I quote the question title: "...get the last element of an automatic array?". How is "get" different from "evaluate"? –  Sep 12 '15 at 10:19
  • 2
    Because the expression given does NOT get the last element of `array`. The question is mislabeled. – Peter Sep 12 '15 at 10:20
  • 1
    @Peter This is not UB. See my answer. – Enzo Ferber Sep 12 '15 at 10:31
  • 2
    @Peter You might want to rethink that. – Enzo Ferber Sep 12 '15 at 10:56
  • I'm curious, why the downvotes? This answer is clearly correct. – Filipe Gonçalves Sep 12 '15 at 11:47
  • 2
    The downvote came from someone who thought his post (claiming the behaviour is not undefined) was correct. Such is life on SO, sometimes. – Peter Sep 12 '15 at 11:53
  • @Peter Here it is opinion based . In their opinion they are right . – ameyCU Sep 12 '15 at 11:54
  • That is true, ameyCU. – Peter Sep 12 '15 at 11:55
  • 1
    @Peter "Such is the life on SO, sometimes". Well, I think you're wrong, and you think I'm wrong - discussions are like this. I suggest you **read** the comments on my answer and try to understand what I'm trying to say... And I still stand on "It's not UB", it's `type` + pointer math. – Enzo Ferber Sep 12 '15 at 12:44
  • 1
    @Peter [This](http://stackoverflow.com/questions/2528318/how-come-an-arrays-address-is-equal-to-its-value-in-c) answer also supports what I'm saying. – Enzo Ferber Sep 12 '15 at 13:09
  • 2
    I think what OP was trying to do was `*((char*)(&array+1)-1)`. I have no idea if that's UB or not. – BlueRaja - Danny Pflughoeft Sep 12 '15 at 16:14
  • @BlueRaja: I would suggest it is. Just as, if `x` was an `int`, evaluating `&x + 1` gives undefined behaviour, so a containing expression of `*((char *)(&x + 1)-1)` also gives undefined behaviour. The only difference is that `array` is of type `int (*)[SOME_SIZE]` (in the OP's second example) whereas `x` is of type `int` here. The rule about a pointer to "one past the end" doesn't apply, since `x` is not an array, and `array` is not an array of arrays (2D array). – Peter Sep 12 '15 at 23:20
  • @Peter I think the type of `&array` is `char(*)[SOME_SIZE]` – Ely Sep 13 '15 at 04:30
  • @Elyasin - the OP provided two examples. The first has `array` as an array of `char`. The second has `array` as an array of `int`. – Peter Sep 13 '15 at 05:04
  • 1
    @Peter: Non-array objects are treated as one-long arrays for the purposes of pointer arithmetic. ("array object" isn't explicitly defined anywhere, but I'm pretty sure that the intent is that `&array` is a pointer to a non-array object of type `char[SOME_SIZE]`) –  Sep 13 '15 at 06:21
  • Hurkl - Can you provide a reference to a section/clause in a standard which says that? I've seen that sort of claim plenty of times in discussion, and on various web pages, but never managed to find support for that in a standard (which would be definitive). – Peter Sep 13 '15 at 07:37
  • 3
    @Peter: the "1-array-long" object clause is under pointer arithmetic. For example in N3690 it's numbered 5.7/4, "For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type." If you want to find it in a different version of the standard, I *think* that section is always called "additive operators", but I might be wrong. – Steve Jessop Sep 13 '15 at 11:13
  • Thanks Steve. Given that (umm) pointer, I found the relevant clause in the 1999 C standard, in Section 6.5.6 para 7. It is also in the 1998 C++ standard, Section 5.7, para 4. [I assume similar clauses in later standards, but don't have versions of those handy on my current machine]. – Peter Sep 13 '15 at 11:43
  • @Peter Please take a look at my 3rd edit on my answer. I added Standard references that support it. – Enzo Ferber Sep 14 '15 at 13:09
  • 3
    @Enzo: What about the text "If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated." which you have highlighted in bold font do you not understand? The wording "shall not" in a standard means that the result of doing otherwise is undefined. – Peter Sep 14 '15 at 13:56
  • @Peter But it does not point to one element past the end of the array. It **points to a pointer to the last element**. Effectively, it's a pointer to pointer, or in this case a `T(*)[10]`, which falls into the usage of the `*` operator I've highlighted. – Enzo Ferber Sep 14 '15 at 14:02
18

I don't think it is safe.

From the standard as @dasblinkenlight quoted in his answer (now removed) there is also something I would like to add:

C99 Section 6.5.6.8 -

[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So as it says , we should not do this *(&array + 1) as it will go one past the last element of array and so * should not be used.

As also it is well known that dereferencing pointers pointing to an unauthorized memory location leads to undefined behaviour .

Community
  • 1
  • 1
ameyCU
  • 16,489
  • 2
  • 26
  • 41
  • 1
    Ah. So I guess it is UB. Thanks! – Spikatrix Sep 12 '15 at 11:01
  • I somewhat doubt it. It hangs a bit on what 'evaluated unary * operator' means. I don't think the inner `*` is evaluated in that way. – MicroVirus Sep 12 '15 at 11:03
  • I don't know if this reference is official, but from http://c0x.coding-guidelines.com/6.5.3.2.html the 1092 clause says: "If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue." – MicroVirus Sep 12 '15 at 11:09
  • @CoolGuy It's not UB. See my answer. Also see the comments on it. – Enzo Ferber Sep 12 '15 at 11:10
  • @MicroVirus can using expression's value in any arithmetic operation be done without evaluating it ? – ameyCU Sep 12 '15 at 11:17
  • @EnzoFerber Ok. I think I'll accept an answer tomorrow as some users say that it is UB while other say it isn't. I'll see to it tomorrow. – Spikatrix Sep 12 '15 at 11:17
  • @MicroVirus [Here is the standard](http://port70.net/~nsz/c/c11/n1570.html#6.5.6) .Refer link for that paragraph. – ameyCU Sep 12 '15 at 11:30
  • 1
    What if he has an int[3][2] and his array is actually the first element? What if he has an int[6] which he pointer-casted to int[3]? Uhm, these wonderful committees... :-( – peterh Sep 12 '15 at 11:45
  • @peterh the standard is unclear about what happens when pointer casting an array of one size to another size – M.M Sep 12 '15 at 11:46
  • 1
    @peterh C is defined by a standards document, not some particular assembly. Generally, casting pointers may change their size and value. Just because you have only worked on a flat memory model system, don't assume that is the only type of system. – M.M Sep 12 '15 at 11:51
  • @M.M Well, I always forget this :-( C must be supported everywhere, that is clear. You have right. – peterh Sep 12 '15 at 11:54
  • I think you are right and the expression is evaluated, therefore it is undefined behaviour – MicroVirus Sep 12 '15 at 12:07
  • 2
    The only way out is if the unary `*` operator here isn't evaluated. I believe that there's always been some rumbling about whether `&*foo` and other similar constructions including this one, should or should not evaluate the unary `*` (that's "should" in the sense of "what do we want to write in the standard"). The C standard added some special language about `&*`, and I don't *think* C++ has, but (a) I may have misremembered and (b) I'm really not very familiar with C++14 at all. – Steve Jessop Sep 13 '15 at 11:17
  • @SteveJessop: I wonder if any useful purpose is served by not having the Standard clarify that `*x` is only said to evaluate its operand in cases where the compiler would be allowed to dereference the pointer even if it were `volatile` [by my understanding, given `int volatile (foo*)[10];` the expression `*foo` would be required to yield a volatile pointer, but any access to the target of `*foo` would be a forbidden side-effect]. Clarifying that "evaluate" essentially means "dereference" would resolve this question. – supercat Sep 15 '15 at 17:40
  • @supercat: "dereference" usually means an lvalue to rvalue conversion on `*foo` (which doesn't take place in the case you only apply address-of). So I guess the machinery is already there in the standard to specify what's needed, and for whatever reasons they've chosen for the condition to be "evaluated" (which absent any special-case wording the sub-expression `*foo` certainly is) and not "converted to rvalue" (which it isn't). – Steve Jessop Sep 15 '15 at 17:43
  • @supercat I agree the whole question is that "**is it evaluated or not ?**" But wouldn't it make sense that expression is used in pointer arithmetic so it would be evaluated ? – ameyCU Sep 15 '15 at 17:43
  • @ameyCU: The C Standard makes clear that there exist cases where, at the *syntactic* level, the `*` serves to modify the meaning of other parts of the expression without being evaluated per se; given `int *foo,*bar;`, the `*` in `foo = &*bar;` will serve to cancel out the outer `&` without being evaluated itself. It's clear that given `int (*x)[2][3][4];`, (int*)x, (int*)*x, (int*)**x, are all the same pointer, just interpreted as different types. The question is whether `*` is "evaluated" in such contexts, or is merely alters the meaning of other parts of the expression. – supercat Sep 15 '15 at 18:24
13

I believe it's undefined behavior for the reasons Peter mentions in his answer.

There is a huge debate going on about *(&array + 1). On the one hand, dereferencing &array + 1 seems to be legal because it's only changing the type from T (*)[] back to T [], but on the other hand, it's still a pointer to uninitialized, unused and unallocated memory.

My answer relies on the following:

C99 6.5.6.7 (Semantics of additive operators)

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Since &array is not a pointer to an object that is an element of an array, then according to this, it means that the code is equivalent to:

char array_equiv[1][SOME_SIZE] = { ... };
/* ... */
printf("Last element = %c", *((*(&array_equiv[0] + 1)) - 1));

That is, &array is a pointer to an array of 10 chars, so it behaves the same as a pointer to the first element of an array of length 1 where each element is an array of 10 chars.

Now, that together with the clause that follows (already mentioned in other answers; this exact excerpt is blatantly stolen from ameyCU's answer):

C99 Section 6.5.6.8 -

[...]
if the expression P points to the last element of an array object, the expression (P)+1 points [...]
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Makes it pretty clear that it is UB: it's equivalent to dereferencing a pointer that points one past the last element of array_equiv.

Yes, in real world, it probably works, as in reality the original code doesn't really dereference a memory location, it's mostly a type conversion from T (*)[] to T [], but I'm pretty sure that from a strict standard-compliance point of view, it is undefined behavior.

Community
  • 1
  • 1
Filipe Gonçalves
  • 20,783
  • 6
  • 53
  • 70
  • What I disagree is that what you're trying to say is that **making calculations** on address that are not allocated is UB. That's not right. You can do calculations on anything you want! If you do a code like this: `int *p = (int*)1;` and never use `p`, it will **never** crash your program. However, if you do try to print the contents of the memory address `1`, it will cause seg. fault. – Enzo Ferber Sep 12 '15 at 14:07
  • 9
    @EnzoFerber That's exactly the problem: you somehow think it's ok to do computation on invalid addresses. On the contrary; in some (most?) cases, computations involving unallocated addresses *are* UB. For example, just the simple fact of *computing* a pointer address that is not a pointer to an element in an array or one past the end is UB. It doesn't matter if you're not going to use it; it's UB (although it will probably work in most platforms). This example is not relevant for this question, but it shows that your assumption is wrong to begin with. – Filipe Gonçalves Sep 12 '15 at 14:21
  • 4
    @EnzoFerber That is to say: in general, you must be really careful about your assumptions if you want strict standard conformance. I'm not saying I disagree with your answer. I fully understand your point of view, and I agree that in practice it's probably harmless to do `(*(*(&a + 1)-1))`, and I understand why you say so, but if we're here to be technically correct and we want to play by the standard rules, sometimes we need to back off a little bit and really play by the book. The question is mostly about: from a standard-compliant point of view, is this guaranteed to work? – Filipe Gonçalves Sep 12 '15 at 14:27
  • @EnzoFerber In the code `int *p = (int*)1;` there is no * operator that is evaluated. – John Hammond Sep 12 '15 at 18:37
  • @FilipeGonçalves The statement **it shall not be used as the operand of a unary * operator that is evaluated.** makes also clear that therefore it can be used with any other operand, otherwise it would just state that the use of all operators is forbidden. – John Hammond Sep 12 '15 at 18:38
  • @LarsFriedrich I know. And that's what I said. If you never try to access the wrong address you'll never have trouble. That gives you the ability to do anything you want with the value. – Enzo Ferber Sep 12 '15 at 18:44
  • "it's still a pointer to uninitialized, unused and unallocated memory" - So what? The same is true for `ptr = NULL;` yet nobody would mind to use `&ptr` or `dblptr = &ptr; dblptr++; ptr = *dblptr;` The question is, what does ptr point to when you do *ptr. – John Hammond Sep 12 '15 at 20:01
  • 2
    @LarsFriedrich that's also UB. `dblptr++` is ok, but dereferencing it is UB. It's the exact same thing. – Filipe Gonçalves Sep 12 '15 at 20:28
  • @FilipeGonçalves The question is whether it points at something legal **before** the dereferencing. Whether the ++ results in something legal or not, cannot be seen in such a short snippet. – John Hammond Sep 12 '15 at 20:35
  • 1
    Based on the votes of answers, I think that it is not safe to use it. Your answer good and includes explanations as well as quotes from the standard. So, I accept it. Thanks! – Spikatrix Sep 13 '15 at 07:18
  • 1
    Given `int foo[5][4]; int *bar = *foo;` is the unary operator "evaluated" in the sense relevant to 6.5.6.8, or does it merely change the assumed type of the pointer from `(int[4])*` to `int*`? – supercat Sep 13 '15 at 18:55
  • @EnzoFerber: Does anything in the Standard mandate that `(int*)1` cannot yield a trap representation which will launch nuclear missiles as soon as anyone tries to store it anyplace? – supercat Sep 13 '15 at 18:57
  • @CoolGuy I added the Standard References in my answer. Please take a look. This extends to Filipe, supercat, Lars, Peter and ameyCu. – Enzo Ferber Sep 14 '15 at 12:32
  • Disagree with "dereferencing `&array + 1` ... but OTOH it's still a pointer to ... memory". Dereferencing `&array + 1` is UB, not some necessarily even a pointer to anything. – chux - Reinstate Monica Sep 14 '15 at 15:30
  • @chux No, `&array + 1` has **type** `T(*)[10]` and using the `*` operator on it will return `T[10]` as the Standard says in **§6.5.3.2 - 3 and 4** (See my answer for the complete paragraphs. It never dereferences the address, only the type. You can come to the same conclusion using Formal Logic: if `&T` returns `pointer to T`, then `&T[10]` returns `pointer to T[10]`. The same for the `*` operator. if `*(pointer to T)` return `T`, then `*(pointer to T[10])` return `T[10]`. – Enzo Ferber Sep 14 '15 at 15:57
  • @chux Yes, I agree with you - that's what my answer says. Maybe I wasn't clear, but I was precisely saying that dereferencing it *is* UB. – Filipe Gonçalves Sep 14 '15 at 16:15
  • @EnzoFerber That paragraph of the standard does not say that you can dereference past-the-end pointers. It merely defines the effect of the `*` operator on types. – Filipe Gonçalves Sep 14 '15 at 16:16
  • @FilipeGonçalves Again, it's not dereferencing a pointer past the last element. It's dereferencing a **pointer to pointer past the last element**, which will result in a **pointer past the last element**. The address past the last element never got evaluated. – Enzo Ferber Sep 14 '15 at 16:17
  • @EnzoFerber No, it's not dereferencing a pointer-to-pointer. If `array` holds elements of type `T`, a pointer-to-pointer to the last element has type `T **`. However, `&array + 1` has type `T (*)[SOME_SIZE]`. That's not a pointer-to-pointer, it's a pointer to an array. Maybe your confusion stems from failing to understand the difference here, but it's crucial. – Filipe Gonçalves Sep 14 '15 at 16:30
  • @FilipeGonçalves The _difference that matters_ for this topic between `T**` and `T(*)[10]` is in how the compiler will calculate the offsets. But an array without any subscripts yields an address, just like a pointer. The `T[10]` helps calculate the offset. I'm saying "pointer to pointer" because that's easier to understand than "pointer to array of 10 ints". But it holds the same principle. If you use the `*` on a "pointer to array of 10 ints", you get back an "array of 10 ints", which, used without subscripts, will give you an address. – Enzo Ferber Sep 14 '15 at 16:34
  • @FilipeGonçalves You're not getting the conceptual way in which the `*` and `&` operators work. Try to visualize the types changing as you evaluate the expression. – Enzo Ferber Sep 14 '15 at 16:36
  • @FilipeGonçalves Furthermore, this hack `*(*(&aray + 1)-1)` only works on arrays with defined size. It won't work on anything else precisely because of the `T[SOME_SIZE]`, which helps in calculating offsets. – Enzo Ferber Sep 14 '15 at 16:39
  • 5
    @EnzoFerber I get it. I really do. As I said, I fully understand your point of view and your answer. I know that dereferencing `&array + 1` *looks* harmless precisely because it's a type-conversion operation. And I'm pretty sure it's ok in most, if not all, machines out there. But the way I see it, there's nothing in the standard that guarantees this behavior. I honestly think it's a glitch in the standard, which leads different people to different conclusions - I think it's nonsense to keep discussing it, as clearly none of us will agree in the near future. Formally, I classify it as UB. – Filipe Gonçalves Sep 14 '15 at 16:40
  • 1
    @FilipeGonçalves: A non-defective standard should make clear that the answer to at least one of the following questions is no: (1) May code with a past-one pointer of array type use the `*` operator on that pointer to receive a past-one pointer for the last element thereof? (2) May a compiler legitimately assume that a program will ever receive input that would cause the `*` operator to be used on a past-one pointer of an array type? If there is nothing in the C Standard which would unambiguously imply that the answer to either question is "no", that should be recognized as a defect. – supercat Sep 15 '15 at 17:29
  • So, if the pointer manipulation is the issue, can we cast to `uintptr_t` first to avoid that problem? E.g., `(T *)((uintptr_t)(&array + 1) - sizeof(*array))`? – jxh Nov 11 '19 at 20:35
2

It is probably safe, but there are some caveats.

Suppose we have

T array[LEN];

Then &array is of type T(*)[LEN].

Next, &array + 1 is again of type T(*)[LEN], pointing just past the end of the original array.

Next, *(&array + 1) is of type T[LEN], which may be implicitly converted to T*, still pointing just past the end of the original array. (So we did NOT dereference an invalid memory location: the * operator is not evaluated).

Next, *(&array + 1) - 1 is of type T*, pointing at the last array location.

Finally, we dereference this (which is legitimate if the array length is not zero): *(*(&array + 1) - 1) gives the last array element, a value of type T.

Note that the only time we actually dereference a pointer is in this last step.

Now, the potential caveats.

First, *(&array + 1) formally appears like an attempt to dereference a pointer that points to an invalid memory location. But it really isn't. That's the nature of array pointers: this formal dereference only changes the type of the pointer, does not actually result in an attempt to retrieve value from the referenced location. That is, array is of type T[LEN] but it may be implicitly converted to type &T, pointing to the first element of the array; &array is a pointer to type T[LEN], pointing at the beginning of the array; *(&array+1) is again of type T[LEN] which may be implicitly converted to type &T. At no point is a pointer actually dereferenced.

Second, &array + 1 may in fact be an invalid address, but it really isn't: My C++11 reference manual tells me explicitly that "Taking a pointer to the element one beyond the end of an array is guaranteed to work", and a similar statement is also made in K&R, so I believe it has always been standard behavior.

Finally, in case of a zero-length array, the expression dereferences the memory location just before the array, which may be unallocated/invalid. But this issue would also arise if one used a more conventional approach using sizeof() without testing for nonzero length first.

In short, I do not believe there is anything undefined or implementation-dependent about this expression's behavior.

Viktor Toth
  • 707
  • 6
  • 13
  • Correction: `*(&array + 1)` should be of type `T(&)[LEN]`, not `T*`. –  Sep 12 '15 at 19:33
  • Thanks. I missed that one when I was composing my reply. – Viktor Toth Sep 12 '15 at 19:45
  • 1
    This answer looks more like "If the compiler does what I think it should do, this won't emit any assembly instructions that access invalid memory" rather than "The standard guarantees that this does what I think it should do". –  Sep 12 '15 at 20:04
  • Since no pointer is dereferenced, it is my opinion that the standard guarantees the expected behavior. Given `T array[LEN]`, the expression `*(&array+1)` represents an entire array (one past the allocated array), and it may be automatically typecast to a pointer to the first element of the array, but never to array data; under no circumstances are any data retrieved from the array by this expression. – Viktor Toth Sep 12 '15 at 20:10
  • 1
    `(&array+1)` is a pointer that is dereferenced (in the sense of what the C and C++ languages mean by "dereferenced", not the sense that "the compiler has emitted a load-from-memory assembly instruction") –  Sep 12 '15 at 20:18
  • OK, let's go back a step. `*(&array)` is the same as `array`: the whole array. If you were to do an assignment, say, `x=*(&array)`, which is the same as `x=array`, then `x` better be of type `T*` and what is assigned is the address of the first array element (after an implicit conversion), not any array data. So no pointer is dereferenced. Now someone else made the point that in C++, `*(&array+1)` might be seen as a reference to a non-existent array, which may be technically forbidden, but I am not sure I am buying this as the construct is valid C code and in C there are no reference types. – Viktor Toth Sep 12 '15 at 20:30
  • 1
    In C, `*&array` is an "lvalue of type `int[LEN]`". The C++ standard also has the same wording. (also, `&array` is a pointer, and you apply unary `*` to it: that's what dereferencing means) (editing: errors in this comment) –  Sep 12 '15 at 20:45
  • Please forgive me but I beg to disagree. The fact that the array name is implicitly converted to a pointer to the first array element on assignment is well documented. And `array` and `*(&array)` are equivalent but neither can be used as lvalues. – Viktor Toth Sep 12 '15 at 20:52
  • 1
    In C11's exact wording is `if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’` –  Sep 12 '15 at 20:54
  • Except for arrays: From Stroustrup, 4th ed., "There is no array assignment, and the name of an array implicitly converts to a pointer to its first element at the slightest provocation". – Viktor Toth Sep 12 '15 at 21:02
  • 1
    You have to provoke it first. `*&array` really can't hope to be `T*` itself; otherwise you'd get totally the wrong thing with `&*&array` (which is a pointer-to-array-of-T) or `sizeof *&array` (which is the size of array-of-T). Also, lvalue is not a synonym for "can be assigned to": some relevant prhases are `A modifiable lvalue is an lvalue that does not have array type...` and `An assignment operator shall have a modifiable lvalue as its left operand` –  Sep 12 '15 at 21:04
1

Imho that might work but is probably unwise. You should carefully review your sw design and ask yourself why you want the last entry of the array. Is the content of the array completely unknown to you or is it possible to define the structure in terms of c structs and unions. If that is the case stay away from complex pointer operations in a char array for example and define the data properly in you c code, in structs and unions where ever possible.

So instead of :

 printf("Last element = %c", *((*(&array + 1)) - 1));

It could be :

 printf("Checksum = %c", myStruct.MyUnion.Checksum);

This clarifies your code. The last letter in your array means nothing to a person not familiar with whats in this array. myStruct.myUnion.Checksum makes sense to anyone. Studying the myStruct structure could explain the whole data structure to anyone. Please use something like that if it can be declared in such a way. If you are in the rare situation you can not, study above answers, they make good sense i think

ameyCU
  • 16,489
  • 2
  • 26
  • 41
-2

a)

If both the pointer operand and the result [of P + N] point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow;
[...]
if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)−(P) has the same value as ((Q)−(P))+1 and as −((P)−((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.

This states that computations using array elements one past the last element is actually completely fine. As some people here have written that the use of non-existent objects for computations is already illegal, I thought I include that part.

Then we need to take care about this part:

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

There is one important part that the other answers omitted and that is:

If the pointer operand points to an element of an array object

This is not the fact. The pointer operand we dereference is not a pointer to an element of an array object, it is a pointer to a pointer. So this whole clause is completely irrelevant. But, there is also stated:

For the purposes of these [additive] operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

What does this mean?

It means our pointer to a pointer is actually again a pointer to an array - of length[1]. And now we can close the loop, because as the first paragraph states, we are allowed to make calculations with one past the array, so we are allowed to make calculations with the array as if it would be an array of length[2]!

In a more graphical way:

ptr -> (ptr to int[10])[0] -> int[10]
    -> (ptr to int[10])[1]

So, we are allowed to make calculations with (ptr to int[10])[1], even though it is technically outside the array of length[1].

b)

The steps that happen are:

array ptr of type int[SOME_SIZE] to the first element array

&array ptr to a ptr of type int[SOME_SIZE] to the first element of array

+ 1 ptr, one more than the ptr of type int[SOME_SIZE]) to the first element array, to a ptr of type int

This is NOT yet a pointer to int[SOME_SIZE+1], according to C99 Section 6.5.6.8. This is NOT yet ptr + SOME_SIZE + 1

* We dereference the pointer to the pointer. NOW, after the dereferencing, we have a pointer according to C99 Section 6.5.6.8, which is past the element of the array and which is not allowed to be dereferenced. This pointer is allowed to exist and we are allowed to use operators on it, except the unary * operator. But we don't use that one on that pointer yet.

-1 Now we subtract one from the ptr of type int to one after the last element of the array, letting ptr point to the last element of the array.

* dereferencing a ptr to int to the last element of the array, which is legal.

c)

And last, but not least:

If it would be illegal, then the offsetof macro would be illegal, too, which is defined as:
((size_t)(&((st *)0)->m))

John Hammond
  • 467
  • 1
  • 4
  • 14
  • Individual objects count as length-one arrays for the purpose of this arithmetic. `array` is an object of type `int[SOME_SIZE]`, and `&array + 1` is a past-the-end pointer, thus dereferencing it is UB. –  Sep 12 '15 at 19:20
  • &array+1 is not a pointer past-the-end of the array, it is a pointer to a pointer. We do not dereference a pointer past-the-end of the array. Did you actually read my answer? – John Hammond Sep 12 '15 at 19:23
  • Mostly; I overlooked that your error starts before where I thought it did: you have the type of `array` wrong, because its type is `int[SOME_SIZE]`, not `int*`. –  Sep 12 '15 at 19:28
  • Ah, okay, forgot the SOME_SIZE in one line. Fixed. Anyway, this is not the part that is of importance. – John Hammond Sep 12 '15 at 19:29
  • @Hurkyl I didn't say that it's a pointer, it evaluates as a pointer in the expression `array`. I know this:"Hey, I hate your answer, so I nitpick about semantics instead talking about the core issue." Please refrain from this. Thank you. – John Hammond Sep 12 '15 at 20:18
  • 1
    I'm pretty sure these "nitpicks" contribute directly into your error, otherwise I wouldn't have bothered. But I have no problem with ceasing to try and correct it. –  Sep 12 '15 at 21:02
  • 2
    `offsetof` is not defined as `((size_t)(&((st *)0)->m))`. That is how some compilers implement it, on some computers. That implementation is undefined behavior, but that's OK in an implementation of the standard library. The implementors of that library know exactly which compiler the library will used with (theirs), and they know exactly what that compiler will do in response to this construct. The authors of the standard library can get away with invoking undefined behavior. You cannot. – David Hammen Sep 12 '15 at 21:35
  • A pointer to the value 0 is not undefined behavior. You are not telling me that a header file of a standard library does not have to conform to the C language, because it's a header file of a library, do you? – John Hammond Sep 12 '15 at 21:54
  • 1
    @LarsFriedrich: A `NULL` pointer is not undefined behavior, true, but using `->` on a `NULL` pointer, even if you are only doing so to take the address of the result, *is* undefined behavior. And yes, I believe they are saying that the header file of the standard library need not conform to standard C; as far as I know, as far as the standard is concerned, it doesn’t need to be a real file or be written in C at all as long as the `#include` introduces the correct prototypes/declarations/macros/types/etc. into the namespace. – icktoofay Sep 12 '15 at 23:36
  • @LarsFriedrich - The header files in the standard library can get away with invoking undefined behavior because when you buy or download a C compiler, you are getting not only the compiler proper but also the implementation of the standard library that goes hand-in-hand with that compiler. The developers of the library can (and do) use undefined behavior because they know the response to that UB. You cannot get away with that in your code because that response is not guaranteed across toolsets, or even across different versions of the same toolset. – David Hammen Sep 13 '15 at 03:31
  • @David: Maybe a better way to say it is that behaviors allowed by "undefined behavior" include producing a standard-conforming implementation of library functions. –  Sep 13 '15 at 06:31
  • The reason the code is legal, is because the standard states:"The unary & operator yields the address of its operand. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted. Thus, &*E is equivalent to E _(even if E is a null pointer)_." – John Hammond Sep 13 '15 at 07:58
  • @LarsFriedrich: The construct `&(*foo)` may legally equal `foo` even when `foo` is a null pointer, but `(char*)&(foo->bar) - (char*)foo` does not legally equal the offset of `foo` when `foo` is null because it is equivalent to `((char*)foo+offset)-(char*)foo`, and the act of adding a non-zero offset to a null pointer is UB. Many compilers will allow the construct as a courtesy even when `foo` is null, since there's no useful reason for them to do otherwise, but I wouldn't be surprised if some compiler authors would rather implement `__offsetof` as a compiler intrinsic... – supercat Sep 13 '15 at 19:03
  • ...and then assume that a program will never receive any inputs that would cause `&foo->bar` to be evaluated when `foo` is null even if the only purpose of the evaluation would be to compute the offset of `bar` (according to the Standard, such an assumption would be legal, and making it could allow compilers to eliminate huge amount of code which might be considered necessary by the programmers that wrote it, but is considered unnecessary by the Standard). – supercat Sep 13 '15 at 19:05
  • @supercat `&(foo->bar) = &((*foo).bar)` - you don't add anything to a null pointer. Otherwise you would end up with (`typesize of pointer * offset`) bytes, which is why you needed to add a typecast to your wrong example, yet there is none in the macro. The dot operator is not the add operator. – John Hammond Sep 13 '15 at 19:25
  • @LarsFriedrich: Given `struct s {uint16_t foo,bar;}; struct s *p;`, if `p` points to address 0x12345678, how would one compute `&(s.bar)` without adding 2 to `p`? A good compiler *should* trap if, at runtime, code attempts to add an offset to a null pointer, even if such action is part of a `->` operator [much of the damage null-pointers cause stems from the fact that many compilers generate code where adding an offset to a null pointer to yield a seemingly-valid non-null pointer]. The only reason to expect that such a thing shouldn't happen here would that a good compiler will... – supercat Sep 13 '15 at 19:38
  • ...recognize that the same base pointer is being added and subtracted, and thus elide the use of the base pointer altogether. Given `int *q = &(s->p);` I would expect that a good compiler should trap if `s` is null even though no effort is made to dereference a pointer. – supercat Sep 13 '15 at 19:40
  • @supercat You add 2 to the register value of 0x12345678. You don't add 2 to p, because if you would add 2 to p, it would add 2 * sizeof(struct s) to p. That's the whole point - it is not _pointer addition_ according to the standard that happens, when you use the dot operator. And because it is not pointer addition, ALL rules for pointer addition are irrelevant, the rule to add the size of the base type is irrelevant and the rule to not add to null pointers. – John Hammond Sep 13 '15 at 20:13
  • @LarsFriedrich: If `p` is null, what should the status of `q` be? I would suggests that there is more value in allowing a compiler to trap at the attempt to perform an address computation yielding a non-null invalid address than in requiring that execution proceed normally but with `q` holding a value that is neither valid nor recognizable as `null`. – supercat Sep 13 '15 at 21:37
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/89542/discussion-between-lars-friedrich-and-supercat). – John Hammond Sep 14 '15 at 05:14