31

AFAIK, although we cannot create a 0-sized static-memory array, but we can do it with dynamic ones:

int a[0]{}; // Compile-time error
int* p = new int[0]; // Is well-defined

As I've read, p acts like one-past-end element. I can print the address that p points to.

if(p)
    cout << p << endl;
  • Although I am sure of we cannot dereference that pointer (past-last-element) as we cannot with iterators (past-last element), but what I am not sure of is whether incrementing that pointer p? Is an undefined behaviour (UB) like with iterators?

    p++; // UB?
    
Itachi Uchiwa
  • 3,044
  • 12
  • 26
  • 4
    UB _"...Any other situations (that is, attempts to generate a pointer that isn't pointing at an element of the same array or one past the end) invoke undefined behavior...."_ from: https://en.cppreference.com/w/cpp/language/operator_arithmetic – Richard Critten Nov 03 '19 at 22:10
  • 3
    Well, this is similar to a `std::vector` with 0 item in it. `begin()` is already equal to `end()` so you cannot increment an iterator that is pointing at the beginning. – Phil1970 Nov 03 '19 at 23:51
  • 1
    @PeterMortensen I think your edit changed the meaning of the last sentence ("What I am sure of -> I am not sure why"), could you please double check? – Fabio says Reinstate Monica Nov 04 '19 at 15:10
  • @PeterMortensen: The last paragraph you've edited has become a bit less readable. – Itachi Uchiwa Nov 04 '19 at 18:10

3 Answers3

33

Pointers to elements of arrays are allowed to point to a valid element, or one past the end. If you increment a pointer in a way that goes more than one past the end, the behavior is undefined.

For your 0-sized array, p is already pointing one past the end, so incrementing it is not allowed.

See C++17 8.7/4 regarding the + operator (++ has the same restrictions):

f the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • 2
    So the only case `x[i]` is the same as `x[i + j]` is when both `i` and `j` have the value 0? – Rami Yen Nov 03 '19 at 22:22
  • 8
    @RamiYen `x[i]` is the same element as `x[i+j]` if `j==0`. – interjay Nov 03 '19 at 22:27
  • 1
    Ugh, I hate the "twilight zone" of C++ semantics... +1 though. – einpoklum Nov 04 '19 at 08:37
  • 4
    @einpoklum-reinstateMonica: There's no twilight zone really. It's just C++ being consistent even for the N=0 case. For an array of N elements, there are N+1 valid pointer values because you can point behind the array. That means that you can start at the begin of the array and increment the pointer N times to get to the end. – MSalters Nov 04 '19 at 11:26
  • @MSalters: UB pointer behavior past array ends are kind of the "dark region"; the one-past-the end is twilight. Something strange, mysterious, not obvious, that you might not have guessed. – einpoklum Nov 04 '19 at 12:26
  • It has been said many times elsewhere, one would not be able to build a working C++ compiler by following the C++ standard to the letter because too many important behaviours are left unspecified or undefined. So, while your quote may come directly from the standard, its utility is close to 0, IMO. – Maxim Egorushkin Nov 04 '19 at 16:27
  • @MaximEgorushkin And what's your alternative? To blindly guess what is allowed? I've been bitten before by UB that "seemed" like it shouldn't cause harm but ended up causing bugs. What value is there in incrementing a pointer more than one past the end of an array anyway? Can you not write a C++ compiler without doing so? – interjay Nov 04 '19 at 16:40
  • Pointer arithmetics are well defined in flat memory models. That is a property of the architecture which would be useful for the compiler to expose to the code. – Maxim Egorushkin Nov 04 '19 at 16:51
  • 1
    @MaximEgorushkin My answer is about what the language currently allows. Discussion about you would like it to allow instead is off-topic. – interjay Nov 04 '19 at 17:25
  • The language allows for more. It is poorly worded in the standard and your answer doesn't provide any insight, unfortunately. – Maxim Egorushkin Nov 04 '19 at 17:33
3

I guess you've already have the answer; If you look a bit deeper: You've said that incrementing an off-the-end iterator is UB thus: This answer is in what is an iterator?

The iterator is just an object that has a pointer and incrementing that iterator is really incrementing the pointer it has. Thus in many aspects an iterator is handled in terms of a pointer.

int arr[] = {0,1,2,3,4,5,6,7,8,9};

int *p = arr; // p points to the first element in arr

++p; // p points to arr[1]

Just as we can use iterators to traverse the elements in a vector, we can use pointers to traverse the elements in an array. Of course, to do so, we need to obtain pointers to the first and one past the last element. As we’ve just seen, we can obtain a pointer to the first element by using the array itself or by taking the address-of the first element. We can obtain an off-the-end pointer by using another special property of arrays. We can take the address of the nonexistent element one past the last element of an array:

int *e = &arr[10]; // pointer just past the last element in arr

Here we used the subscript operator to index a nonexisting element; arr has ten elements, so the last element in arr is at index position 9. The only thing we can do with this element is take its address, which we do to initialize e. Like an off-the-end iterator (§ 3.4.1, p. 106), an off-the-end pointer does not point to an element. As a result, we may not dereference or increment an off-the-end pointer.

This is from C++ primer 5 edition by Lipmann.

So it is UB don't do it.

Community
  • 1
  • 1
Raindrop7
  • 3,889
  • 3
  • 16
  • 27
-4

In the strictest sense, this is not Undefined Behavior, but implementation-defined. So, although inadvisable if you plan to support non-mainstream architectures, you can probably do it.

The standard quote given by interjay is a good one, indicating UB, but it is only the second best hit in my opinion, since it deals with pointer-pointer arithmetic (funnily, one is explicitly UB, while the other isn't). There is a paragraph dealing with the operation in the question directly:

[expr.post.incr] / [expr.pre.incr]
The operand shall be [...] or a pointer to a completely-defined object type.

Oh, wait a moment, a completely-defined object type? That's all? I mean, really, type? So you don't need an object at all?
It takes quite a bit of reading to actually find a hint that something in there might not be quite so well-defined. Because so far, it reads as if you are perfectly allowed to do it, no restrictions.

[basic.compound] 3 makes a statement about what type of pointer one may have, and being none of the other three, the result of your operation would clearly fall under 3.4: invalid pointer.
It however doesn't say that you aren't allowed to have an invalid pointer. On the contrary, it lists some very common, normal conditions (e.g. end of storage duration) where pointers regularly become invalid. So that's apparently an allowable thing to happen. And indeed:

[basic.stc] 4
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

We are doing an "any other" there, so it's not Undefined Behavior, but implementation-defined, thus generally allowable (unless the implementation explicitly says something different).

Unluckily, that's not the end of the story. Although the net result doesn't change any more from here on, it gets more confusing, the longer you search for "pointer":

[basic.compound]
A valid value of an object pointer type represents either the address of a byte in memory or a null pointer. If an object of type T is located at an address A [...] is said to point to that object, regardless of how the value was obtained.
[ Note: For instance, the address one past the end of an array would be considered to point to an unrelated object of the array's element type that might be located at that address. [...]].

Read as: OK, who cares! As long as a pointer points somewhere in memory, I'm good?

[basic.stc.dynamic.safety] A pointer value is a safely-derived pointer [blah blah]

Read as: OK, safely-derived, whatever. It doesn't explain what this is, nor does it say I actually need it. Safely-derived-the-heck. Apparently I can still have non-safely-derived pointers just fine. I'm guessing that dereferencing them would probably not be such a good idea, but it's perfectly allowable to have them. It doesn't say otherwise.

An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value.

Oh, so it may not matter, just what I thought. But wait... "may not"? That means, it may as well. How do I know?

Alternatively, an implementation may have strict pointer safety, in which case a pointer value that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object is of dynamic storage duration and has previously been declared reachable

Wait, so it's even possible that I need to call declare_reachable() on every pointer? How do I know?

Now, you can convert to intptr_t, which is well-defined, giving an integer representation of a safely-derived pointer. For which, of course, being an integer, it is perfectly legitimate and well-defined to increment it as you please.
And yes, you can convert the intptr_t back to a pointer, which is also well-defined. Only just, not being the original value, it is no longer guaranteed that you have a safely-derived pointer (obviously). Still, all in all, to the letter of the standard, while being implementation-defined, this is a 100% legitimate thing to do:

[expr.reinterpret.cast] 5
A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size [...] and back to the same pointer type [...] original value; mappings between pointers and integers are otherwise implementation-defined.

The catch

Pointers are just ordinary integers, only you happen to use them as pointers. Oh if only that was true!
Unluckily, there exist architectures where that isn't true at all, and merely generating an invalid pointer (not dereferencing it, just having it in a pointer register) will cause a trap.

So that's the base of "implementation defined". That, and the fact that incrementing a pointer whenever you want, as you please could of course cause overflow, which the standard doesn't want to deal with. The end of application address space may not coincide with the location of overflow, and you do not even know whether there is any such thing as overflow for pointers on a particular architecture. All in all it's a nightmarish mess not in any relation of the possible benefits.

Dealing with the one-past-object condition on the other hand side, is easy: The implementation must simply make sure no object is ever allocated so the last byte in the address space is occupied. So that's well-defined as it's useful and trivial to guarantee.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • 1
    Your logic is flawed. "So you don't need an object at all?" misinterprets the Standard by focusing on a single rule. That rule is about compile time, whether your program is well-formed. There's another rule about run time. Only at run time can you actually talk about the existence of objects at a certain address. your program needs to meet **all** rules; the compile-time rules at compile time and the run-time rules at run time. – MSalters Nov 04 '19 at 13:50
  • 5
    You have similar logic flaws with "OK, who cares! As long as a pointer points somewhere in memory, I'm good?". No. You have to follow all rules. The difficult language about "end of one array being begin of another array" just gives the **implementation** permission to allocate memory contiguously; it doesn't need to keep free space between allocations. That does mean **your code** might have the same value A both as the end of one array object and the start of another. – MSalters Nov 04 '19 at 13:57
  • 1
    "A trap" is not something that can be described by "implementation defined" behaviour. Note that interjay has found the restriction on the the `+` operator (from which `++` flows) which means that pointing after "one-after-the-end" is undefined. – Martin Bonner supports Monica Nov 04 '19 at 14:36
  • The quote in my answer is for pointer+integer arithmetic, not pointer-pointer as you say. As for the `++` operator, The standard says "The expression ++x is equivalent to x+=1". And `+=` is defined in terms of `+`. So the restriction from `+` applies also to `++` and this is UB. – interjay Nov 04 '19 at 14:42
  • @Damon: In the ISO C++ standard, it's clearly *Undefined Behaviour*. The fact that some implementations (typically on flat memory models) do provide happens-to-work behaviour means *nothing* for that, not even that it's implementation-defined. Yes implementations are *allowed* to define behaviour that the standard leaves undefined (e.g. `gcc -fwrapv` for signed-integer overflow). Most C++ implementations do in practice define the behaviour of forming an invalid pointer (under aligned or outside an object), but being defined by many (but not all) implementations != "implementation defined" – Peter Cordes Nov 04 '19 at 14:44
  • "implementation defined" in standardese has a specific meaning, which is that all implementations are *required* to define the behaviour one way or another. Please don't confuse people by using that term for another meaning. – Peter Cordes Nov 04 '19 at 14:46
  • 1
    @PeterCordes: Please do read [basic.stc, paragraph 4](https://timsong-cpp.github.io/cppwp/n4659/basic.stc). It says _"Indirection [...] undefined behavior. **Any other use** of an invalid pointer value has **implementation-defined** behavior"_. I am not confusing people by using that term for another meaning. It is the exact wording. It is **not** undefined behavior. – Damon Nov 04 '19 at 15:45
  • Ah, I see what you're saying. That's true if you manage to get an invalid pointer without already causing UB, like a pointer to an object that goes out of scope. And you are using correct terminology for the point you're making. But as interjay's answer points out, incrementing a one-past-the-end pointer is UB itself. My understanding is that specific trumps general, so "any other use" should be read as "any other use that isn't specifically called-out as UB". – Peter Cordes Nov 04 '19 at 15:51
  • Also, this isn't exactly an invalid pointer. You can't deref it, but it is the pointer to one-past-the-end of an object, sort of. You can `delete` it, unlike an invalid pointer. It's a return value from `new` that hasn't yet been deleted. – Peter Cordes Nov 04 '19 at 15:52
  • It is high time for the C++ standard to just require well-defined pointer arithmetics for [flat memory models](https://en.wikipedia.org/wiki/Flat_memory_model) without resorting to vague and controversial _implementation defined_ language. +1 – Maxim Egorushkin Nov 04 '19 at 16:21
  • @PeterCordes: As for invoking UB _doing the post-increment_ because it is (allegedly) equivalent to `p += 1`, that statement would be correct if it was pre-increment, for which this is indeed the case. As it happens, the question contains post-increment. For post-increment, the wording says no such thing. It merely says the value is modified by adding 1 to it (which is _not_ the same thing, and does not pull in any implied meaning from any other sections). Although the standard is probably _intended_ to say that (consistency?!), factually it _doesn't_ say so. Thus, no UB. – Damon Nov 04 '19 at 17:47
  • 2
    It's barely possible you've found a loophole for post-increment but you don't quote the full section on what post-increment does. I'm not going to look into that myself right now. Agreed that if there is one, it's unintended. Anyway, as nice as it would be if ISO C++ defined more things for flat memory models, @MaximEgorushkin, there are other reasons (like pointer wrap-around) for not allowing arbitrary stuff. See comments on [Should pointer comparisons be signed or unsigned in 64-bit x86?](//stackoverflow.com/posts/comments/82363942) – Peter Cordes Nov 04 '19 at 17:58
  • `p` is not an invalid pointer value. (C++17 basic.compound/3.2 -- it is "a pointer past the end of an object"). So `p++` does not come under "use of an invalid pointer value" . – M.M Nov 04 '19 at 21:16
  • @M.M: In that case, it would be well-defined, but no. It is a pointer past no object, i.e. basic.compound/3.4 rather than 3.2. Allocating an array of size zero gives an address that you can pass back to `delete[]`, sure, but _zero_ objects. So `p` does not point to an object, and thus `p++` doesn't move `p` one past the end of an object. – Damon Nov 04 '19 at 22:04
  • @Damon `new int[0]` allocates a zero-sized array (the standard says this). The result of `new` can never be an invalid pointer value . For an array of size `n`, there can be a pointer to hypothetical element `n`, even though there is no such element, this is called the "past the end" pointer and is referred to by 3.2 – M.M Nov 04 '19 at 22:14
  • Also, your logic would make `delete p;` be undefined behaviour (since passing an invalid pointer value to a deallocation function causes undefined behaviour). – M.M Nov 04 '19 at 22:21
  • Please read the wording carefully before making such claims (also read what I wrote). There is no such thing as "past the end of array", only "past the end of an object" (basic.compound) and "past the last _element_ of an array" (in expr:add). In other words, no object, no "past end". A zero-sized array contains no objects (it has zero elements, so there is no last element). Further, `delete[] p;` is not UB, I did not claim that, I said the exact opposite. This is one of the few well-defined things that you're allowed to do with it. – Damon Nov 04 '19 at 22:41
  • You are claiming that the result of `new int[0]` is an invalid pointer value . According to the standard, deleting an invalid pointer value causes undefined behaviour. – M.M Nov 04 '19 at 22:54
  • BTW your position that an array of zero elements cannot have a past-the-end pointer would also make `v.end()` be an invalid pointer value for an empty vector (and therefore common methods of iterating over the vector would not be well-defined) – M.M Nov 04 '19 at 23:10
  • @PeterCordes That's a lot of words to say that `uintptr_t` is more robust than `intptr_t`, +1 anyway. – Maxim Egorushkin Nov 05 '19 at 13:03
  • @MaximEgorushkin: The conclusion in my linked answer is that `intptr_t` is more resistant of wraparound-induced errors because the signed-overflow boundary is in the middle of the non-canonical hole on x86-64. I should add a TL:DR – Peter Cordes Nov 05 '19 at 13:06