Is using sentinel pointer values near the maximum underlying integer value safe?

Question

I was going through some code, that in addition to null pointers is using some special values like (T*)-1, generally as return values if some "create" function fails.

Where the type being pointed to is a type large enough such that ((T*)-n) + sizeof(T) will overflow, meaning that address could never actually be allocated for an instance of type T, is this OK? Could a compiler see something like if (ptr == (T*)-1), decide that is impossible and optimise it out?

`ptrdiff_t` seems to always be defined as a larger type than `size_t`. Related: https://stackoverflow.com/questions/42574890/why-is-the-maximum-size-of-an-array-too-large — Lundin, Apr 10 '19 at 15:03
@Lundin, the actual type or in terms of allowed range? Id expect them to be the same size as a pointer on most platforms (e.g. 4 or 8 bytes), since `ptr1 - ptr2` is only OK if they are from the same object/allocation. The actual pointer range doesn't have to be limited, e.g. 3GB usable application memory on some 32bit systems (meaning the high bit of pointers must be used) but still 32bit `ptrdiff_t` with only 2GB range. — Fire Lancer, Apr 10 '19 at 15:12
@Lundin I can't agree with that. On my not-so-exotic 64-bit Linux system, they're both 64 bits. — Christian Gibbons, Apr 10 '19 at 15:17
Per C 2018 6.3.2.3 5, the result of `(T *) -1` is implementation-defined. So the question of whether it can be used as a sentinel in the first place, before even being concerned about the compiler recognizing it could not be the starting address of a `T`, is dependent on the implementation. — Eric Postpischil, Apr 10 '19 at 15:42
Who says there's an "underlying integer value" at all? There *might* be (and typically is) something that could be matched to that description in any given implementation, but C certainly doesn't promise it. — John Bollinger, Apr 10 '19 at 15:42
Well, being used only for equality, what the value is, doesn't matter, as long as it is given some unique value? Not clear on the unique part from the wording, I understand `(T*)(uintptr_t)(T*)p` should round-trip, so the implementation can't for example make every integer into a null pointer. Is "implementation defined...might a...might b..." limited to what is listed? Can be the accepted answer. Personally I was happier with using the address of say a global or better yet an output parameter but only reviewing this code. — Fire Lancer, Apr 10 '19 at 15:54

John Bollinger · Accepted Answer · 2019-04-11T12:06:50.213

TL;DR: (T*)-1 will likely work as intended in practice, but for safety, portability, and future-proofing, you should use null pointers as sentinels instead.

I was going through some code, that in addition to null pointers is using some special values like (T*)-1, generally as return values if some "create" function fails.

In fact, some POSIX interfaces, such as shmat(), behave similarly, returning (void *)-1 to indicate an error. For them, this is the equivalent of many other standard functions returning the int value -1. It is a value that is intended never to be a valid return value for a successful call. That must therefore work on every POSIX-conforming implementation, and I think other POSIX requirements have the combined effect of requiring the same to hold for pointer types other than void *, too.

More generally, C explicitly permits integers to be converted to pointers without restriction, but with the caveat that

Except [for null pointer constants], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

(C2011, 6.3.2.3/5). The main concerns with such a conversion, then, are

that the result of (T*)-1 is a trap representation, in which case the scheme you describe produces undefined behavior.
that the result of (T*)-1 could be a valid pointer to a T, in which case using it as a sentinel is unsafe.

To the best of my knowledge, the first of those is not an issue for any C implementation you're likely to meet. I think the second is unlikely to be an issue for you in practice, either, but if you are targeting non-POSIX systems then I am less confident about that one.

You go on to ask,

Where the type being pointed to is a type large enough such that ((T*)-n) + sizeof(T) will overflow, meaning that address could never actually be allocated for an instance of type T, is this OK? Could a compiler see something like if (ptr == (T*)-1), decide that is impossible and optimise it out?

This is an interesting question. Supposing that (T*)-1 does not produce a trap representation, this provision applies:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

(C2011, 6.5.9/6)

Unfortunately, however, this is a bit of a mess.

Although the standard places constraints on the types of pointer operands of an == expression, it does not require their values to be valid pointers. Lest there be any doubt about this, it is required for internal consistency with the provisions of section 6.3.2.3, which specify results of equality comparisons involving null pointers (not limited to null pointer constants).

If at least one of the operands of x == y is an invalid pointer other than a null pointer, such as, we may presume, (T *)-1, then none of the alternatives given by 6.5.9/6 holds, so the expression should evaluate to 0. A compiler might use that to justify optimizing out the test and branch.

In practice, however, implementations often fail to conform in this regard. Instead, they take their cue from historic behavior, perhaps justifying themselves by the fleeting reference to the address space in 6.5.9/6, or maybe taking a liberal view of what an object is. For implementations that afford a flat view of the address space, this manifests as == being evaluated in terms of whether the addresses to which the pointer values correspond are the same, regardless of those addresses' relationships to any object. An implementation such as that must not optimize out the == test, because it cannot safely assume that it will always fail.

The bottom line, then, is that although the compiler is unlikely to optimize away the test, you cannot rely on the standard for assurance that it will not do so. You are on safer ground if you use null pointers as your sentinels, for notwithstanding the inconsistency I called out, in practice, null pointers of the same type do compare equal in all implementations, pursuant to 6.3.2.3/4.

Given the wording of 6.5.9/6, I don't think it's safe to assume `( T * ) -1 == ( T * ) -1` Since `( T * ) -1` is likely not a null pointer constant, the compiler could well assume under 6.5.9/6 that a comparison to that literal value is always false. (Agreed, though, that POSIX seems to mandate "proper" behavior here.) So a test such as `if ( ptr == ( T * ) -1 )...` could very well be optimized away. — Andrew Henle, Apr 10 '19 at 18:14
I agree in principle, @AndrewHenle, and indeed this answer already speaks to that issue. I guess you're saying you have less confidence than I do about the comparison working as intended in practice, which is fair. Ultimately, I think the take home is "don't do that", and I'll twek my wording a bit to emphasize that. — John Bollinger, Apr 10 '19 at 19:07
@JohnBollinger Thanks for the indepth explanation of the standard. The POSIX note is interesting, as that extends to a lot of platforms. I generally agree with "use null pointers as sentinels instead", surprised that `shmat` seems not to. In this case it was in addition to in order to cram more data in the return, `if (null) else if (-1) else if (-2) else success` style rather than doing say `int create(obj_t **out)` or `errno` or such which I see far more commonly. — Fire Lancer, Apr 11 '19 at 09:33

Is using sentinel pointer values near the maximum underlying integer value safe?

1 Answers1