47

In C, it is perfectly well to make a pointer that points to one past the last element of an array and use it in pointer arithmetics, as long as you don't dereference it:

int a[5], *p = a+5, diff = p-a; // Well-defined

However, these are UBs:

p = a+6;
int b = *(a+5), diff = p-a; // Dereferencing and pointer arithmetic

Now I have a question: Does this apply to dynamically allocated memory? Assume I'm only using a pointer pointing to one-past-the-last in pointer arithmetics, without dereferencing it, and malloc() succeeds.

int *a = malloc(5 * sizeof(*a));
assert(a != NULL, "Memory allocation failed");
// Question:
int *p = a+5;
int diff = p-a; // Use in pointer arithmetic?
iBug
  • 35,554
  • 7
  • 89
  • 134
  • Good question. It's made more interesting by the fact that the memory doesn't have an effective type before you write to it. – StoryTeller - Unslander Monica Dec 20 '17 at 07:05
  • @StoryTeller This is interesting because in C++ when you use `new` then it's perfectly fine. – iBug Dec 20 '17 at 07:05
  • 2
    Well, you tagged C. And C++'s `new` is a different beast. It's more than mere memory allocation. Plus, C++ language lawyers would say that just writing to the memory returned by malloc doesn't create an object there, let alone make the memory have an effective type. – StoryTeller - Unslander Monica Dec 20 '17 at 07:06
  • 10
    You can actually have a pointer to *anywhere*, as long as you don't dereference it. You can even use it for comparison with other pointers, even though it might make no sense. – Some programmer dude Dec 20 '17 at 07:10
  • 3
    @Someprogrammerdude That's too wild. Isn't that UB? – iBug Dec 20 '17 at 07:11
  • 6
    @Someprogrammerdude - But I don't think you can obtain that pointer to anywhere in every way. For instance, you can't do pointer arithmetic like **iBug** pointed out. That's UB by itself. You may cast an integral constant to a pointer, but there's no guarantee it would be the same address as `a + 6` for instance. – StoryTeller - Unslander Monica Dec 20 '17 at 07:11
  • 1
    To late to update my comment, but it should be added that you can't dereference it *or use it for pointer arithmetic*. Doing e.g. `int *some_variable = (int *) 0x1234` is perfectly valid, and often used on small embedded system for memory mapped registers. Having a pointer to anywhere is not a problem as long as you don't attempt to do anything with it. It's *using* the pointer that can lead to UB, if it doesn't point anywhere valid. – Some programmer dude Dec 20 '17 at 07:18
  • And to answer the question, "Is it well-defined to point to one-past-malloc?", then *yes* it is valid and works just the same as for one-past an array. – Some programmer dude Dec 20 '17 at 07:20
  • Guess that's not what I meant. Now I explicitly ask for pointer arithmetic. – iBug Dec 20 '17 at 07:21
  • 2
    ISO/IEC 9899:2011 §7.22.3 **Memory management functions** ¶1 _The order and contiguity of storage allocated by successive calls to the `aligned_alloc`, `calloc`, `malloc`, and `realloc` functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)._ It says "an array of such objects"—it's OK for arrays; therefore it's OK here. – Jonathan Leffler Dec 20 '17 at 07:59
  • 1
    @JonathanLeffler Sounds like a valid answer. What about posting it as an answer? – iBug Dec 20 '17 at 08:02
  • I'm hoping one of the answers quoting the relevant parts of the language definition (§6.x.y) will pick up on it, maybe including links to [N1570](http://port70.net/~nsz/c/c11/n1570.html) which is an online copy of a late draft of the C11 standard. I'm mildly puzzled by the reference to N4296 in one answer; the latest mailing of the [WG14 (C standard committee)](http://www.open-std.org/jtc1/sc22/wg14/www/docs/PreAlbuquerque2017.htm) references document numbers in the n21xx range. – Jonathan Leffler Dec 20 '17 at 08:26
  • The only thing that makes `malloc` a special case is that the allocated data has no _effective type_. The type of the allocated data is determined upon access, as specified in 6.5/6. When writing `int* ptr = malloc(n*sizeof *ptr); ...ptr[0] = x;` you actually never get an array type, but each chunk of data accessed gets the effective type `int`. Not an array of int, but a whole bunch of individual `int`. The C standard doesn't make much sense here. – Lundin Dec 20 '17 at 08:39
  • One more thing to note: `int diff` might be too small for the difference between the first and last element of *an* array. And `ptrdiff_t` *as well* - if that happens, behaviour is undefined. – Antti Haapala -- Слава Україні Dec 20 '17 at 08:42
  • @JonathanLeffler: N4296 is probably a mixup of filenames, N4296 is post-C++ 14, but the section numbers appear to be from some C draft. – ninjalj Dec 20 '17 at 10:43
  • 1
    Note that pointer subtraction like `p-a` returns a type `intptr_t` whose range can exceed `int`. – chux - Reinstate Monica Dec 20 '17 at 12:59

4 Answers4

26

The draft n4296 for C11 is explicit that pointing one past an array is perfecly defined: 6.5.6 Language / Expressions / Additive operators:

§ 8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. ... Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object... If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

As the type of the memory is never precised in the sub clause, it applies to any type of memory including allocated one.

That clearly means that after:

int *a = malloc(5 * sizeof(*a));
assert(a != NULL, "Memory allocation failed");

both

int *p = a+5;
int diff = p-a;

are perfectly defined and as the usual pointer arithmetic rules apply, diff shall receive the value 5.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • If I write `p = a+6` then I can't expect `p - a == 6` according to the standard, right? – iBug Dec 20 '17 at 07:28
  • 3
    @iBug Yes, you cannot expect it to work. *" If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; **otherwise, the behavior is undefined**"* – user694733 Dec 20 '17 at 07:33
  • @iBug the standard mandated defined behaviour only up to *one paste the last element of the array object*. If you go further (2 past last element), nothing is specified by the standard which is enough to be Undefined Behaviour. – Serge Ballesta Dec 20 '17 at 07:41
  • @iBug A particular concern that your example raises is that overflows in pointer arithmetic are undefined behavior in C++. Thus the rules basically state that malloc will never allocate the last byte of memory *unless* that compiler also simultaneously defines overflow in a way that makes these overflow issues invisible. – Cort Ammon Dec 20 '17 at 20:47
  • 1
    The published WG14 paper with the highest N-number is currently N2184. Where did you get N4296 from? – T.C. Dec 21 '17 at 01:31
  • @T.C. N4296 sounds like a early draft for C++17. – iBug Jan 01 '18 at 10:14
23

Is it well-defined to use a pointer pointing to one-past-malloc?

It is well defined if p is pointing to one past the allocated memory and it is not dereferenced.

n1570 - §6.5.6 (p8):

[...] If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Subtracting two pointers are valid only when they point to elements of the same array object or one past the last element of the array object, otherwise it will result in undefined behavior.

(p9):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object [...]

The above quotes are well applicable for both dynamically and statically allocated memory.

int a[5];
ptrdiff_t diff = &a[5] - &a[0]; // Well-defined

int *d = malloc(5 * sizeof(*d));
assert(d != NULL, "Memory allocation failed");
diff = &d[5] - &d[0];        // Well-defined

Another reason that this is valid for dynamically allocated memory, as pointed by Jonathan Leffler in a comment is:

§7.22.3 (p1):

The order and contiguity of storage allocated by successive calls to the aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

The pointer returned by malloc in the above snippet is assigned to d and the memory allocated is an array of 5 int objects.

iBug
  • 35,554
  • 7
  • 89
  • 134
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 2
    Formally, how does the data pointed at by `d` end up as an array? According to the C standard, the effective type of the malloc:ed data is that used for lvalue access. Which is `int`, not `int[5]`. – Lundin Dec 20 '17 at 08:44
  • 2
    @Lundin; No, it doesn't. `d` is a pointer that points to the first block of the memory chunk allocated by `malloc`. – haccks Dec 20 '17 at 08:49
  • 1
    The cited text only shows that allocated storage _can be used_ to store arrays, not how the data _becomes_ an array. Suppose I do `int(*ptr)[5] = malloc_chunk; memcpy(something, ptr, 5*sizeof(int);` Then I make the effective type an array type. But without such code, the "chunk" is not formally an array type. I don't think there is any text in the standard that makes sense to cite here, the rules about effective type (and strict aliasing) are simply poor. – Lundin Dec 20 '17 at 09:29
  • The word "until" is ambiguous (or even wrong) here: *It is well defined **until** the pointer pointing to one past the allocated memory.* According to your answer it is still true when the pointer points to one past, but "until" means "when it happens it's no longer true", so you'd better find a better wording. – iBug Dec 20 '17 at 10:40
  • @iBug: What use of "until" is ambiguous or wrong? In the Standard, it applies to the clause "the space is explicitly deallocated". Once the space is freed, pointers to it cease to be valid. – supercat Dec 20 '17 at 17:44
  • @Lundin: Under C89's concept of "object", one could regard the result from `malloc` as returning a pointer to a union containing all possible combinations of types that could fit in the indicated storage. There would be no need for that storage to ever "become" anything else. C99's Effective Type rule is an abomination that requires completely changing the notion of what an "object" is in ways that can't be consistent with usages of the term elsewhere in the Standard, and invents an unnecessary new concept of runtime state. A simpler and better rule would have simply said... – supercat Dec 20 '17 at 18:01
  • 1
    ...that a compiler may regard two uses of an lvalue will be unsequenced relative to anything between them absent certain evidence of outside access, and that a compiler may hoist or defer accesses to the beginning/end of a function or loop if there is no evidence of outside access in the code it's moved across. Given `void test(int *ip, float *fp, int mode) { *ip=1; *fp=2; if (mode) *ip=1;};` if `ip` and `fp` alias, the Effective Type rule would require that the Effective Type of the storage be left as either `int` or `float`, depending upon `mode`, but there's no evidence that should matter. – supercat Dec 20 '17 at 18:11
  • @haccks Good correction. My note on the word "until" is somewhat a concern about English, not the C standard. It looks certainly better now. – iBug Dec 20 '17 at 23:58
7

Yes, the same rules apply to variables with dynamic and automatic storage duration. It even applies to a malloc request for a single element (a scalar is equivalent to a one-element array in this respect).

Pointer arithmetic is only valid within arrays, including one past the end of an array.

On dereferencing, it's important to note one consideration: with respect to the initialisation int a[5] = {0};, the compiler must not attempt to dereference a[5] in the expression int* p = &a[5]; it must compile this as int* p = a + 5; Again, the same thing applies to dynamic storage.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • In `int* p = &a[5];` `a[5]` is not dereferenced. It is equivalent to `int p = a + 5;` or might be I am getting that para wrong. – haccks Dec 20 '17 at 08:16
  • 4
    I'm trying to say that there is no UB with the expression &a[5] since the compiler must treat it as a + 5. Does it not read well? I have a cold following a weekend of implementing this: https://meta.stackexchange.com/questions/303920/winter-bash-2017-counting-down-page-whats-with-the-fence/303921#303921 – Bathsheba Dec 20 '17 at 08:17
7

Is it well-defined to use a pointer pointing to one-past-malloc?

Yes, yet a corner case exists where this is not well defined:

void foo(size_t n) {
  int *a = malloc(n * sizeof *a);
  assert(a != NULL || n == 0, "Memory allocation failed");
  int *p = a+n;
  intptr_t diff = p-a;
  ...
}

Memory management functions ... If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. C11dr §7.22.3 1

foo(0) --> malloc(0) may return a NULL or non-NULL. In the first implementation a return of NULL is not an "Memory allocation failure". This means code is attempting int *p = NULL + 0; with int *p = a+n; which fails the guarantees about pointer math - or at least brings such code into question.

Portable code benefits by avoiding 0 size allocations.

void bar(size_t n) {
  intptr_t diff;
  int *a;
  int *p;
  if (n > 0) {
    a = malloc(n * sizeof *a);
    assert(a != NULL, "Memory allocation failed");
    p = a+n;
    diff = p-a;
  } else {
    a = p = NULL;
    diff = 0;
  }
  ...
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I really wonder why the standard does not necessitate the return of NULL pointer in case `0` was passed to `malloc()`. Why the standard goes through the trouble of stating: "either a null pointer is returned, or the behavior is as if the size were some nonzero value".? – machine_1 Dec 20 '17 at 14:34
  • 2
    @machine_1 - I'd guess that two alternative implementations already existed by the time the (first) standard was written. – Oliver Charlesworth Dec 20 '17 at 14:41