7

After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.

If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?

#include <stdio.h>
#include <string.h>

int main()
{
    char s[] = "some string";
    const char *t = s + strlen(s);

    while(t-->s)
        fputc(*t, stdout);
    fputc('\n', stdout);

    return 0;
}

It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon t when it already contains the base-address of s invokes undefined behavior, regardless of whether the resulting value of t (not the t-- expression result) is evaluated or not. I simply want to know if that is indeed the case.

The cited portions of the standard were:

6.5.6 Additive Operators

  1. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

and its nearly tightly coupled relationship with...

6.5.2.4 Postfix increment and decrement operators Constraints

  1. The operand of the postfix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.

Semantics

  1. The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)

  2. The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).

Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).

The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:

#include <stdio.h>
#include <string.h>

int main() 
{
    char s[] = "some string";

    size_t len = strlen(s);    
    char *t = s + len - 1;
    while(t >= s) 
    {
        fputc(*t, stdout);
        t = t - 1;
    }
    fputc('\n', stdout);
}

Forgetting for a moment this has a non-zero-length string for s, this general algorithm clearly has issues (perhaps not as clearly to some). If s[] were instead "", then t would be assigned a value of s-1, which itself is not in the valid range of s through its one-past-address, and the evaluation for comparison against s that ensues is no good. If s has non-zero length, that addresses the initial s-1 problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against s to terminate the loop. It could be worse. it could have naively been:

    size_t len = strlen(s) - 1;
    char *t = s + len;

This has disaster written all over it if s were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...

My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • 1
    @ShafikYaghmour *Please* point me there and it shall certainly receive accolades. I apologize for not discovering it if you indeed have done so. – WhozCraig May 28 '15 at 17:00
  • It is a long question I have to read it again but [it this what you are looking for](http://stackoverflow.com/q/18186987/1708801)? – Shafik Yaghmour May 28 '15 at 17:01
  • @ShafikYaghmour; Those answers are not satisfactory. To me when traversing an array in reverse then `t--` is pointing to one past the last element of the array and is valid. – haccks May 28 '15 at 17:03
  • @ShafikYaghmour aww....alas it is not. Great answer, however (love deeply informative brevity), and still gets up-props. This question isn't about evaluating a one-before or one-past address, but rather whether the very act of performing post-decrement on a pointer already loaded with the sequence base address in-itself invokes UB. – WhozCraig May 28 '15 at 17:05
  • Why should it? How can any part of the system possibly "know" that your `t` has anything to do with `s`, or moved below `s` on the last iteration, when you don't subsequently use it? – Weather Vane May 28 '15 at 17:11
  • @WhozCraig well that question has two parts, line 4 is about pointing before the array which maybe I am being slow seems to be what this is about as well. rici's answer below is basically saying what I said in my answer to that question. – Shafik Yaghmour May 28 '15 at 17:12
  • @ShafikYaghmour I suspect you're both correct. The place where I'm continually struggling with is the simple fact that the value of `t` after-decrement is never actually used. Rather the prior value is the prospect of the conditional evaluation, and the prior value was indeed valid. Both you and rici seem to be of the same mind on this, now that I reread your answer a half-dozen times. – WhozCraig May 28 '15 at 17:22

1 Answers1

7

I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.

What's wrong with:

char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);

Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.

rici
  • 234,347
  • 28
  • 237
  • 341
  • I suspect you are indeed correct on this. Thanks to both you and Shafik for taking the time. Originally I had thought (likely the alcohol had something to do with it; it was late) if anything such an operation would be more suitable to something akin to a trap, but the following morning a shook that off and reality set back in =P. Thanks again. – WhozCraig May 28 '15 at 17:26
  • @WhozCraig: While it seems pedantic in practice, consider the case where a pointer is a pair of segment + offset. In that case, `s` might have offset 0, and decrementing it might be a trap because the decrement pointer operation doesn't allow wraparound on offsets. – rici May 28 '15 at 17:30
  • Oh I understood exactly why the *eval* of such a "thing" could be disastrous (the situation you cited being one), but the very act of performing the decrement itself, whether used or not, constituting UB eluded me. – WhozCraig May 28 '15 at 17:32