26

As answered elsewhere, calling functions like memcpy with invalid or NULL pointers is undefined behaviour, even if the length argument is zero. In the context of such a function, especially memcpy and memmove, is a pointer just past the end of the array a valid pointer?

I'm asking this question because a pointer just past the end of an array is legal to obtain (as opposed to, e.g. a pointer two elements past the end of an array) but you are not allowed to dereference it, yet footnote 106 of ISO 9899:2011 indicates that such a pointer points to into the address space of the program, a criterion required for a pointer to be valid according to §7.1.4.

Such usage occurs in code where I want to insert an item into the middle of an array, requiring me to move all items after the insertion point:

void make_space(type *array, size_t old_length, size_t index)
{
    memmove(array + index + 1, array + index, (old_length - index) * sizeof *array);
}

If we want to insert at the end of the array, index is equal to length and array + index + 1 points just past the end of the array, but the number of copied elements is zero.

Community
  • 1
  • 1
fuz
  • 88,405
  • 25
  • 200
  • 352
  • 1
    It still seems UB to me. array+index+1 may or may not be a valid address. You probably won't have any problems with many implementations but you still can't depend on it. – holgac Apr 24 '15 at 09:57
  • 1
    @holgac The standard explicitly says that it's legal to create and use a pointer just past the end of an object, but you are not allowed to dereference it. The wording in the standard seems unclear in this regard. – fuz Apr 24 '15 at 10:00
  • @FUZxxl You can also point to NULL but not dereference. I think that sentence is meaningless. I mean, I can perfectly point to 0x12345678 and I won't have a single problem until I dereference it. Anyway, if `memcpy` or `memmove` implementation performs some cache-friendly operation before actually checking length, they would dereference your pointer and boom. – holgac Apr 24 '15 at 10:03
  • 2
    @holgac That is implementation defined behaviour in general. For the specific case where the value stored in the `long` is equal to a value obtained by casting a valid pointer into an `uintptr_t`, it is well-defined. – fuz Apr 24 '15 at 10:10
  • What does the sentence you quoted try to say? Is it "the result is implementation-defined *BECAUSE* it may not ..." or, "besides the fact that the pointer might not ..., the behaviour is also implementation defined"? If former, I fail to see the problem of pointing to 0x12345678 in machines with 32 bit address space. – holgac Apr 24 '15 at 10:16
  • 3
    @holgac I think I understand your confusion now. Casting the integer `0x12345678` is implementation defined behaviour. If the resulting pointer is invalid, your program exhibits undefined behaviour and may crash immediately, independently of whether you dereference the pointer or not. This is because some machines allow you to place only valid pointers in pointer registers. – fuz Apr 24 '15 at 10:48
  • Ah great, thanks! That also explains why some projects store addresses in integral types but not vice versa (linux kernel sometimes uses `ulong` to store a pointer) – holgac Apr 24 '15 at 10:51
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/76174/discussion-on-question-by-fuzxxl-is-it-legal-to-call-memcpy-with-zero-length-on). – Taryn Apr 24 '15 at 12:36
  • @holgac you'd save memory over using two variables. But we're going off-topic right now. – fuz Apr 24 '15 at 12:36

3 Answers3

11

Passing the past the end pointer to the first argument of memmove has several pitfalls, probably resulting in a nasal demon attack. Strictly speaking, there is no impermeable guarantee for that to be well defined.

(Unfortunatelly, there is not much information about the "past the last element" conecpt in the standard.)

Note: Sorry about having the other direction now...

The question basicially is whether the "one past the end pointer" is a valid first function argument for memmove if 0 bytes are moved:

T array[length];
memmove(array + length, array + length - 1u, 0u);

The requirement in question is the validity of the first argument.

N1570, 7.1.4, 1

If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.

If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.

Making the argument valid if the pointer

  1. is not outside the address space,
  2. is not a null pointer,
  3. is not a pointer to const memory

and if the argument type

  1. is not of array type.

1. Address space

N1570, 6.5.6, 8

Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.

N1570, 6.5.6, 9

Moreover, if the expression P points either to an element of an array object or one past the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as -((P)-((Q)+1)), and has the value zero if the expression P points one past the last element of the array object, even though the expression (Q)+1 does not point to an element of the array object.106

106 Another way to approach pointer arithmetic is first to convert the pointer(s) to character pointer(s): In this scheme the integer expression added to or subtracted from the converted pointer is first multiplied by the size of the object originally pointed to, and the resulting pointer is converted back to the original type. For pointer subtraction, the result of the difference between the character pointers is similarly divided by the size of the object originally pointed to.

When viewed in this way, an implementation need only provide one extra byte (which may overlap another object in the program) just after the end of the object in order to satisfy the "one past the last element" requirements.

Eventhough the footnote is not normative -as pointed out by Lundin- we have an explanation here that "an implementation need only provide one extra byte". Although, I can't proove by quoting I suspect that this is a hint that the standard means to require the implementation to included memory inside of the programs address space at the location pointed to by the past the end pointer.

2. Null Pointer

The past the end pointer is not a null pointer.

3. Pointing to const memory

The standard imposes no further requirements on the past the end pointer other than giving some information about the result of several operations and the (again non-normaltive ;)) footnote clarifies that it can overlap with another object. Thus, there is no guarantee that the memory the past the end pointer points at is non constant. Since the first argument of memove is a pointer to non-constant memory, passing the past the end pointer is not guaranteed to be valid and potentially undefined behaviour.

4. Validity of array arguments

Chapter 7.21.1 describes the string handling header <string.h> and the first clause states:

The header declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.

I don't think that the standard is very clear here whether the "objects treated as arrays of character type" refers to the functions or to the macro only. If this sentence actually implies that memove treats the first argument as an array of characters, the behaviour of passing the past the end pointer to memmove is undefined behaviour as per 7.1.4 (which requires a pointer to a valid object).

Pixelchemist
  • 24,090
  • 7
  • 47
  • 71
  • It would be useful to also quote the text " If an argument to a function has an invalid value (such as a value outside the domain of the function, **or a pointer outside the address space of the program**,or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified)". The start of your answer obviously refers to that section but without showing the text so it is a bit confusing. – M.M Apr 24 '15 at 11:54
  • 1
    Foot notes in ISO standards are never normative, so you can completely disregard note 106. – Lundin Apr 24 '15 at 11:54
  • 2
    @Lundin they can be used as a clue to the intent of the authors when the standard has unclear or defective text – M.M Apr 24 '15 at 11:56
  • @Pixelchemist `array + length` could theoretically point to constant memory. – alain Apr 24 '15 at 11:56
  • @MattMcNabb Yes but in the case where the foot note and normative text are contradicting, you should disregard the foot note completely. – Lundin Apr 24 '15 at 11:59
  • I think the normative text is defective here, it doesn't define *address space* properly or explain what that requirement is about. For example if an object is fully at the end of the address space, and the one-past-the-end pointer is outside the space, that *could* still be conforming if the implementation also makes some other choices to support it. Except perhaps for this unclear text in 7.1.4 – M.M Apr 24 '15 at 12:02
  • 2
    One could argue that this extra byte is not writeable since dereferencing it is undefined behaviour, ergo writing to it is undefined behaviour. – fuz Apr 24 '15 at 12:11
  • @FUZxxl Oh, yeah, that's correct ... There is probably too little evidence to state that the standard guarantees defined behaviour here. Updated accordingly. :) – Pixelchemist Apr 24 '15 at 13:12
  • 1
    @Pixelchemist: I find the analysis of such minutiae interesting, given that the only plausible scenario where such a thing could matter would be if a compiler went to the trouble of inferring that such a thing could occur for the express purpose of being able to ignore its normal obligations (and--with 99% certainty--programmer intent!) in such a case. I hope a group with some clout is able to spin off a variation of the C standard which would largely mirror the old one but would define silly corner cases like this so programmers wouldn't have to worry about them. – supercat Apr 24 '15 at 16:21
9

3.15 object

  1. object region of data storage in the execution environment, the contents of which can represent values

The memory, pointer to one past the last element points to, of an array object or an object cannot represent values, since it cannot be dereferenced ( 6.5.6 Additive operators, paragraph 8 ).

7.24.2.1 The memcpy function

  1. The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

Pointers passed to memcpy must point to an object.

6.5.3.4 The sizeof and _Alignof operators

  1. When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1. When applied to an operand that has array type, the result is the total number of bytes in the array. When applied to an operand that has structure or union type, the result is the total number of bytes in such an object, including internal and trailing padding.

sizeof operator doesn't count the one-past element as the object, since it doesn't count towards the size of the object. Yet it clearly gives the size of the entire object.

6.3.2.1 Lvalues, arrays, and function designators

  1. An lvalue is an expression (with an object type other than void) that potentially designates an object; 64) if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

I argue that the one past pointer to an array object or an object, both of which are otherwise allowed to point to, does not represent an object.

int a ;
int* p = a+1 ; 

p is defined, but it does not point to an object since it cannot be dereferenced, the memory it points to cannot represent a value, and sizeof doesn't count that memory as a part of the object. Memcpy requires a pointer to an object.

Therefore the passing one past pointer to memcpy causes undefined behavior.

Update:

This part also support the conclusion:

6.5.9 Equality operators

  1. Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

This implies that pointer to an object if incremented to one past an object, can point to a different object. In that case, it certainly cannot point to the object it pointed to originally, showing that pointer one past an object doesn't point to an object.

Community
  • 1
  • 1
2501
  • 25,460
  • 4
  • 47
  • 87
6

If we look at the C99 standard, there is this:

7.21.1.p2

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters. ...

There is no explicit statement in the description of memcpy in 7.21.2.1

7.1.4.p1

... If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.

Emphasis added. It seems the pointers have to point to valid locations (in the sense of dereferencing), and the paragraphs about pointer arithmetic allowing to point to the end + 1 do not apply here.

There is the question if the arguments to memcpy are arrays or not. Of course they are not declared as arrays, but

7.21.1.p1 says

The header string.h declares one type and several functions, and defines one macro useful for manipulating arrays of character type and other objects treated as arrays of character type.

and memcpy is in string.h.
So I would assume memcpy does treat the arguments as arrays of characters. Because the macro mentioned is NULL, the "useful for..." part of the sentence clearly applies to the functions.

alain
  • 11,939
  • 2
  • 31
  • 51
  • 2
    The phrase “all address computations and accesses to objects” seems to pertain the address computations and accesses done by the function you pass the pointer to. Since `memmove` with a zero argument is not allowed to dereference the pointer (as it moves zero bytes), the valid accesses are an empty set. The `NULL` pointer is still excluded by the previous sentence. I might be wrong with this notion though. – fuz Apr 24 '15 at 11:01
  • I'm not sure that `memmove` is not allowed to dereference the pointer when the size is 0. Is it described like this somewhere? – alain Apr 24 '15 at 11:05
  • Is the first argument of `memmove` explicitly described as "being an array"? – Pixelchemist Apr 24 '15 at 11:33
  • 1
    All of the cited text refers to the case of `void func (size_t n, int [n])` so I don't see how this is relevant or answers the question. – Lundin Apr 24 '15 at 11:42
  • @Lundin The cited text pertains functions with a size argument and pointer arguments. There may be more than one pointer argument. – fuz Apr 24 '15 at 12:06
  • @alain Well, the standard defines an array as “a contiguously allocated nonempty set of objects with a particular member object type, called the element type,” (cf. §6.2.5/20) explicitly precluding arrays of length zero. I'm even more confused now. – fuz Apr 24 '15 at 12:08
  • 1
    @Lundin That's not right... The preamble to C11 draft 1570's section on `string.h` states (7.24.1.1): _The header `` declares one type and **several functions**, and defines one macro useful for manipulating **arrays of character type and other objects treated as arrays of character type**. 307) The type is `size_t` and the macro is `NULL` (both described in 7.19). Various methods are used for determining the lengths of the arrays, but in all cases a `char *` or `void *` argument points to the initial (lowest addressed) character of the array._ – Iwillnotexist Idonotexist Apr 24 '15 at 12:17
  • 1
    @Lundin Furthermore, §7.24.1/2 says: “Where an argument declared as size_t n specifies the length of the array for afunction, n can have the value zero on a call to that function. [...] On such a call, [...] a function that copies characters copies zero characters.” Furthermore, footnote 189 affirms that `memcpy` is not allowed to copy bytes beyond the specified length even if it undoes the effect afterwards with respect to concurrency. – fuz Apr 24 '15 at 12:20
  • @FUZxxl I think the portion _"(that would be valid if the pointer did point to the first element of such an array)"_ is operative here. `n=0`, so we have _"..., and a function that copies characters copies zero characters."_. Therefore no accesses are made, and we cannot speak of the validity of an access if it is not made. Moreover the address computations _are_ valid (one-past-the-end), so I contend the answer to OP is "yes". – Iwillnotexist Idonotexist Apr 24 '15 at 12:21
  • @IwillnotexistIdonotexist Possibly. Also, the standard does not preclude reading access to the source argument of `memcpy` with a 0 length argument, for instance, if the implementation made sure that all obtainable non-NULL pointers were dereferencable. – fuz Apr 24 '15 at 12:23
  • @IwillnotexistIdonotexist None of this proves anything about the behavior of memcpy though. – Lundin Apr 24 '15 at 12:54
  • @FUZxxl Well there you go then, that's the answer to the question. The reference in 7.24.1 to 7.1.4 most likely refers to the first sentence of 7.1.4. The second sentence as cited in this answer, seems to be concerned about pointer arithmetic inside the functions remaining reliable, since all pointer arithmetic must refer to memory locations within the same array, or it will invoke UB. – Lundin Apr 24 '15 at 12:59
  • @Lundin What comment do you refer to with “well there you got then?” If you mean my last comment, then you err: If an implementation were to allow the dereferentiation of one byte past the end of an object, this wouldn't change anything with respect to my question. – fuz Apr 24 '15 at 13:12
  • @FUZxxl Since 7.24.1 states that n may be 0 but that "pointer arguments on such a call shall still have valid values, as described in 7.1.4.". And 7.1.4 in turn says that the pointer must be within address range of the program. Then see the answer by Pixelchemist about whether the array+1 is to be regarded as "within address range" or not. – Lundin Apr 24 '15 at 13:27