15

Does the following provoke undefined behavior in line 4 and/or 5:

#include <stdio.h>
int main(void)
{
  char s[] = "foo";
  char * p = s - 1;      /* line 4 */
  printf("%s\n", p + 1); /* line 5 */
  return 0;
}
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
alk
  • 69,737
  • 10
  • 105
  • 255
  • It has been a while but the duplicate although related does not actually seem to be a duplicate of this question. I could reopen but since I am the accepted answer I will let someone else do that. – Shafik Yaghmour Oct 01 '14 at 12:19
  • @ShafikYaghmour: "*... the duplicate [...] does not actually seem to be a duplicate ...*" For what reason(s), please? From what you (among others) answer line 4 actually provokes UB, same does the `array - 1` in the linked question. – alk Oct 01 '14 at 14:45
  • Although the topics are similar they are not really the same question, I find myself more skeptical of duplicate closures after this [this](http://meta.stackoverflow.com/q/266364/1708801) meta discussion but there is a seemingly wide divergence of opinion on this topic. – Shafik Yaghmour Oct 04 '14 at 01:52
  • I found a situation in which this undefined behavior actually makes the calculation wrong (on a normal x86): http://stackoverflow.com/questions/23683029/is-gccs-option-o2-breaking-this-small-program-or-do-i-have-undefined-behavior – Bernd Elkemann Jan 28 '15 at 20:13

3 Answers3

16

Decrementing the pointer outside the array bounds is undefined.

C99 standard item 6.5.6 paragraph 8 says, in part,

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

So your line 4 is invoking undefined behaviour since the result is neither within the array or one past the end of it.

Nigel Harper
  • 1,270
  • 9
  • 13
  • 1
    The phrasing "the *evaluation* shall not produce" makes me curious whether the pointer is actually *evaluated*. And the "p + 1" is for sure inside the array. – Captain Giraffe Aug 12 '13 at 13:31
  • @CaptainGiraffe: The pointer is not evaluated - the expression is and results in (of type) pointer. In this case you have expression `s - 1` which evaluation in given context results in some value. For example for `C[s -> 00000001]` the `s - 1` evaluates to `00000000`. My reading is that _evaluation_ is abstract here and is in mathematical space (you shell not write expression which, if evaluated, would result in ....) rather then a part of small-step execution. – Maciej Piechotka Aug 12 '13 at 13:34
  • @MaciejPiechotka Yes your reading of the text seems to coincide with the other answers. I'm not so sure. I am under the impression that evaluation means looking into that spot; UB of course. After all it should just be regular integer arithmetic, behaving regularly (anecdotal reasoning, I'm aware). – Captain Giraffe Aug 12 '13 at 13:40
  • 1
    @CaptainGiraffe "I am under the impression that evaluation means looking into that spot" -- your impression is wrong -- that's *dereferencing*, not evaluation. Maciej is wrong too -- the evaluation is not abstract, it refers to the calculation of the value resulting from the pointer arithmetic. – Jim Balter Aug 12 '13 at 13:43
  • @JimBalter Along with your other comments. Yours, "the (pointer) calculation" is the only explanation that makes sense. Thanks – Captain Giraffe Aug 12 '13 at 13:55
  • @CaptainGiraffe "it should just be regular integer arithmetic" - not necessarily. That is true on the modern systems most of us are familiar with but it's not required to be true by the standard, and it isn't necessarily true on more exotic hardware. AIUI the standard puts such tight restrictions on pointer arithmetic to make life easier for compiler writers on such systems. – Nigel Harper Aug 12 '13 at 14:35
  • @CaptainGiraffe: The evaluation of `s - 1` on line 4, and of `p` on line 5, has undefined behavior. – Keith Thompson Aug 12 '13 at 15:06
  • @MaciejPiechotka: Your interpretation is somewhat incorrect. For example, the subexpression `s-1` in `0&&(s-1)` is never evaluated. – R.. GitHub STOP HELPING ICE Aug 12 '13 at 16:28
  • @R: Yes. I realized that I word it incorrectly after Jim's comment (`0 && s - 1`) - my meaning was with contexts so in such case there needed to be a context where `C[0] != 0`. – Maciej Piechotka Aug 12 '13 at 17:43
  • Note, that the "undefined behaviour" may be avoided by simply separating the array definition `char s[] = "foo"` and the pointer arithmetic `s - 1` into separate .c files. Because the undefined behaviour only arises due to the compiler being able to prove that the pointer arithmetic leaves the defined range; if s is just a (decayed) `char*`, `s - 1` does not invoke any undefined behaviour. – cmaster - reinstate monica Aug 12 '13 at 19:51
  • 1
    @cmaster No, that's not how it works. Compilers don't have to go looking for undefined behaviour. Nothing in 6.5.6 paragraph 8 says the array definition has to be visible to invoke undefined behaviour. – Nigel Harper Aug 12 '13 at 20:57
  • @NigelHarper Yet it is perfectly defined what happens when I do `(s - 1) + 1` to a `char* s`. This has to yield identity, because it's just pointer arithmetic, and there is nothing about the pointer that could tell the processor to let orange elephants appear when it subtracts one from it. The danger is, that the compiler may make some elephants disappear (i. e. your code) when it proves that they commit the crime of being undefined behaviour. This is why it is relevant what the compiler sees and is able to prove, i. e. how the code is distributed among compilation units. – cmaster - reinstate monica Aug 13 '13 at 21:16
11

Yes the line 4 is undefined behavior!

C99 6.5.6 Additive operators, Section 8

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P) + N (equivalently, N + (P)) and (P) - N(where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression(P) + 1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q) - 1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Community
  • 1
  • 1
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
8

Does the following provoke undefined behavior in line 4 and/or 5:

Yes, Line 4 is undefined behavior since the pointer is not pointing within the array bounds or one past the array bounds. Although it is valid to point one past the array bounds you can not dereference that element.

The relevant section in the c99 draft standard is 6.5.6 Additive operators paragraph 8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

The end of paragraph says that you shall not deference one past the last element:

[...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • Took me too long to get to the spec, but yup. It's worth noting that in practice this always going to work because it's more effort to make it not work, but theoretically it's not guaranteed. – chrylis -cautiouslyoptimistic- Aug 12 '13 at 12:36
  • 2
    @chrylis There are C interpreters that will halt execution at line 4. – Jim Balter Aug 12 '13 at 12:42
  • @JimBalter I've already had to think about CINT once this week... *holds head* – chrylis -cautiouslyoptimistic- Aug 12 '13 at 12:44
  • 3
    @chrylis: *"in practice this always going to work"* -- That's a bad assumption. An optimizing compiler can assume that the code's behavior is well defined, and transform the code based on that assumption. – Keith Thompson Aug 12 '13 at 15:04
  • @KeithThompson: As a special case, if the compiler can determine statically that a code path leads unconditionally to undefined behavior, it can simply remove that entire code path from the output, since the only way the program could avoid having undefined behavior is by never reaching that code path. – R.. GitHub STOP HELPING ICE Aug 13 '13 at 02:56
  • The spec uses the phrase "array object". Does this work the same way when using memory allocated with `malloc`? – Charles Holbrow Jan 12 '17 at 04:04