32

This question follows this previous question about the definedness of memcpy(0, 0, 0), which has been conclusively determined to be undefined behavior.

As the linked question shows, the answer hinges on the contents of C11's clause 7.1.4:1

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, […]) […] the behavior is undefined. […]

The standard function memcpy() expects pointers to void and const void, as so:

void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

The question is worth asking at at all only because there are two notions of “valid” pointers in the standard: there are the pointers that can validly be obtained through pointer arithmetics and can validly be compared with <, > to other pointers inside the same object. And there are pointers that are valid for dereferencing. The former class includes “one-past” pointers such as &a + 1 and &b + 1 in the following snippet, whereas the latter class does not include these as valid.

char a;
const char b = '7';
memcpy(&a + 1, &b + 1, 0);

Should the above snippet be considered defined behavior, in light of the fact that the arguments of memcpy() are typed as pointers to void anyway, so the question of their respective validities cannot be about dereferencing them. Or should &a + 1 and &b + 1 be considered “outside the address space of the program”?

This matters to me because I am in the process of formalizing the effects of standard C functions. I had written one pre-condition of memcpy() as requires \valid(s1+(0 .. n-1));,until it was pointed to my attention that GCC 4.9 had started to aggressively optimize such library function calls beyond what is expressed in the formula above (indeed). The formula \valid(s1+(0 .. n-1)) in this particular specification language is equivalent to true when n is 0, and does not capture the undefined behavior that GCC 4.9 relies on to optimize.

Community
  • 1
  • 1
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • Your example (I'm sure someone will flag this with the appropriate 'asked before' link) invokes undefined behavior because you are accessing memory that is not allocated or defined by your program. – JohnH Aug 19 '14 at 18:36
  • 4
    I do believe it's **not** UB, because the expressions `&a + 1` and `&b + 1` *are* valid (C11 sec 6.5.6) as long as they are not dereferenced. However, if `memcpy(0, 0, 0)` is considered UB, then this would be UB as well. – Drew McGowen Aug 19 '14 at 18:37
  • @JohnH: you are probably right (and I tend to believe like you), but could you cite the C11 standard to defend your claim? Pascal Cuoq is very knowledgable about C .... – Basile Starynkevitch Aug 19 '14 at 18:37
  • @JohnH Well the specification of `memcpy` says it copies `n` characters, with `n` being `0` in my example, so I am not sure that “accessing” is the right word here. – Pascal Cuoq Aug 19 '14 at 18:42
  • @DrewMcGowen `memcpy(0, 0, 0)` is considered UB because 7.1.4:1 says that a standard function cannot be passed a null pointer as argument unless explicitly stated otherwise (I almost reported a GCC bug building an argument on the fact that `snprintf(0, 0, …)` was a C idiom, but indeed the specification of `snprintf` explicitly says that it can receive a null pointer together with a size of 0). – Pascal Cuoq Aug 19 '14 at 18:47
  • Not an expert, but my reading of the "one past rule" is that such pointers should only be considered valid in the context of a wider expression referring back somehow to the base object (i.e. where a "sufficiently smart compiler" could always optimise them out or identify them for what they are). This is not such a context. – Alex Celeste Aug 19 '14 at 18:55
  • 1
    @Leushenko: A "one past the end" pointer is perfectly valid in at least some contexts that don't refer back to the base object. Example: `int i = 42; int *p = &i + 1; int *q = p;`. The reference to `p` in the initializer for `q` is valid and harmless, as long as neither `p` nor `q` is dereferenced. – Keith Thompson Aug 19 '14 at 19:00
  • Note that your pastebin isn't actually printing the pointer value. You can't use `%x` with pointers, only `%p`. – Ben Voigt Aug 20 '14 at 05:52
  • @BenVoigt Yes, it is my great shame that I intended %p and wrote %x. Pastebin does not allow editing but I will move the example to another URL. Ideally I would put it on ideone but it is not ideal to demonstrate the behavior of specific recent versions. – Pascal Cuoq Aug 20 '14 at 06:17
  • @Leushenko: If `memcpy` is called with a length of zero, is there anything which it would be "authorized" to do with a pointer that it would not be legal with a "one-past" pointer? Passing null for `src` or `dest` is forbidden even when `length` is zero because a legal memcpy would be allowed to use a top-down copy operation, which would in turn require computing `src+length` and `dest+length`. Such computations which would be UB if `src` or `dest` is null, but legal if `src` and `dest` are "one-past" pointers and `length` is zero. – supercat Jun 23 '15 at 18:59

2 Answers2

18

C11 says:

(C11, 7.24.2.1p2) "The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1."

&a + 1 itself is a valid pointer to integer addition but &a + 1 is not a pointer to an object, so the call invokes undefined behavior.

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 6
    See also C11 7.24.1p2: "... **`n`** can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call **shall still have valid values**, as described in 7.1.4." (emphasis added). Feel free to add this to your answer if you like. – Keith Thompson Aug 19 '14 at 19:04
  • This is particularly nasty since it means that you can't for example write `void cut_and_shut(const char *restrict src, size_t n, size_t a, size_t b, char *restrict dst) { memcpy(dst, src, a); memcpy(dst+a, src+b, n-b); }`, and then use it for general range manipulations including the possibility that `b == n` and the range runs right to the end of an object such that `src+n` is a one-past-the-end pointer. You need to special-case the empty second "half". – Steve Jessop Aug 19 '14 at 21:46
  • ... but there's no contesting it, I suppose. The quoted text seems quite clear that the arguments to `memcpy` must point to objects regardless of the value of `n`, so I only have myself to blame if I consider the need for a special case in my `cut_and_shut` function to be counter-intuitive ;-) – Steve Jessop Aug 19 '14 at 21:52
  • 2
    I'm not convinced. The pointers here are not invalid values so the reasoning that leads to `memcpy(0, 0, 0)` does not apply. The quoted text doesn't say that s2 must be pointing to an object ; "pointed to by s2" is a predicate to describe where characters are taken from, but if no characters are taken then there is no read of memory that does not contain a value. – M.M Aug 19 '14 at 22:03
  • @Keith Thompson Which part of 7.1.4, if any, applies applies to `&b + 1`? – chux - Reinstate Monica Aug 19 '14 at 22:31
  • 1
    @chux: Hmm. Now that I actually read 7.1.4, I'm less sure. "If a function has an invalid value (*such as* ...)"; the "such as" implies it's not exhaustive. "If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid." -- but with `n == 0`, no elements are accessed. But we can still fall back to the "object pointed to" wording (quoted in this answer) to infer UB. – Keith Thompson Aug 19 '14 at 22:54
  • 1
    @chux: On the other hand, I can't think of any good reason for an implementation to misbehave given `memcpy(&a + 1, &b + 1, 0)` (where by "misbehave" I mean treating it as something other than a no-op). – Keith Thompson Aug 19 '14 at 22:55
  • @Keith Thompson Agree - it should be [ain't misbehavin](http://en.wikipedia.org/wiki/Ain't_Misbehavin'_(musical)). UB seems to hide in that the pointer `&a + 1` passed to `memcpy()` cannot have +1 added to it as that is UB. `memcpy()` parameter `s1`, as a valid pointer, should be able add +1. Not that `memcpy()` really _needs_ to add 1, when `n==0`, to its `s1` to do its job. – chux - Reinstate Monica Aug 19 '14 at 23:02
  • @KeithThompson Treating the `memcpy(…, …, 0)` as a no-op should not be difficult to obtain from compilers. Instead, I am worried that when faced with `p = c2 ? &a : &a + 1; memcpy(p, …, c1);` the compiler will infer that `p` can only be `&a`, even if `c1` can be `0`. This is what GCC 4.9 does, with `NULL` instead of `&a + 1`, in my now fixed example at http://pastebin.com/raw.php?i=fRbGfQ6p – Pascal Cuoq Aug 20 '14 at 07:50
  • 1
    Perhaps someone should submit a DR; the fact that we have having this discussion indicates that the standard is unclear. – M.M Aug 21 '14 at 09:02
  • 2
    @MattMcNabb I have wanted to submit a DR before for an unrelated C standard question (surprise, surprise) and it was not clear that you could do this without being a paying member of some sort of national organization. I think I'll keep submitting my DRs as SO questions and hope that someone with the right privileges picks up on them. – Pascal Cuoq Aug 21 '14 at 19:52
  • 1
    How do you read "s2 points to an object" from this wording? `memcpy` is copying no bytes from no object here. – tmyklebu Sep 09 '14 at 11:12
5

While the "correct" answer according to the standard appears to disagree, I can find it only disingenuous that after int a[6]; int b[6]; all of

memcpy(a+0, b+0, 6);
memcpy(a+1, b+1, 5);
memcpy(a+2, b+2, 4);
memcpy(a+3, b+3, 3);
memcpy(a+4, b+4, 2);
memcpy(a+5, b+5, 1);

should be valid (and copy an area ending at the end of the arrays) while

memcpy(a+6, b+6, 0);

is valid in light of the count but not of the addresses. It's the same end of the copied area!

Personally, I'd lean towards defining memcpy(0,0,0) being valid as well (with the rationale of just demanding valid pointers but no objects) but at least it's a singular case while the "end of array" case is an actual exception to an otherwise regular pattern for copying an area at the end of an array.

  • Reason why `a+6` might cause UB is that functions expect "valid" pointers to memory & may do pointer math on them before examining `n==0`. The pointer should be allowed to have a valid result when +1 is added. It is known `&a[0]` to `&a[5]` are valid pointers to memory and each one of them can have +1 resulting in valid pointers for pointer math purposes. `a+6` results in a valid pointer for pointer math, but `a+6+1` (which `memcpy()` might do) appears to push things too far. Although it sounds silly that `memcpy()` would _need_ to do that calculation, it must be allowed. +1 for your idea. – chux - Reinstate Monica Aug 19 '14 at 22:40
  • It's important to handle limiting cases sensibly. – Pete Becker Aug 19 '14 at 23:45
  • @chux: `a+6+1` is bad, but so is `a+1+6` and that doesn't make the second line of this answer invalid. `memcpy` shouldn't be computing pointers outside the designated range. – Ben Voigt Aug 20 '14 at 03:06
  • @Ben Voigt Agree 100% `memcpy(s1=a+6,..., n=0)` should not _need_ to compute `s1+1`. But `memcpy()` expects to receive `s1` as a valid pointer to an object (C11dr §7.24.2.1) and it not every valid pointer to an object + 1 also valid for pointer math? IMO `memcpy(any_value, any_value, 0)` is invalid per spec, but the spec should change to allow it. – chux - Reinstate Monica Aug 20 '14 at 05:44
  • @chux: On some systems, it may be reasonable to implement `void memcpy(void const *src, void *dest, size_t length) { char const*s = (char*)src; char *d = (char*)dest; char const *e=s+length; while(s!=e) {*d++ = *s++;}`. Such an implementation would cause UB with `memcpy(0,0,0)`, but would be just fine when given "one-past" pointers. – supercat Jun 23 '15 at 19:06
  • @Ben Voigt Agreed. Note: suspect you meant `void memcpy(void *dest, void const *src, size_t length)` – chux - Reinstate Monica Jun 23 '15 at 19:16