93

I am not so well-versed in the C standard, so please bear with me.

I would like to know if it is guaranteed, by the standard, that memcpy(0,0,0) is safe.

The only restriction I could find is that if the memory regions overlap, then the behavior is undefined...

But can we consider that the memory regions overlap here ?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 7
    Mathematically the intersection of two empty sets is empty. – Benoit Mar 09 '11 at 08:22
  • I wanted to check for you want (x)libC does for you, but as [it's asm](http://www.eglibc.org/cgi-bin/viewcvs.cgi/fsf/trunk/libc/sysdeps/i386/i686/memcpy.S?rev=9649&view=markup) (elibc/glibc here), it's a bit too complicated for an early morning :) – Kevin Mar 09 '11 at 08:25
  • Why *would* you do that? By the way, overlapping memory regions are not the only reason for UB with `memcpy`. – eq- Mar 09 '11 at 08:27
  • 20
    +1 I love this question both because it's such a strange edge case and because I think `memcpy(0,0,0)` is one of the weirdest pieces of C code I've seen. – templatetypedef Mar 09 '11 at 08:29
  • 2
    @eq Do you really want to know, or are you implying that there are no situations when you would want it? Have you considered that the actual call might be, say, `memcpy(outp, inp, len)`? And that this could occur in code where `outp` and `inp` are dynamically allocated and are initially `0`? This works, e.g., with `p = realloc(p, len+n)` when `p` and `len` are `0`. I myself have used such a `memcpy` call -- while it is technically UB, I've never encountered an implementation where it isn't a no-op and don't ever expect to. – Jim Balter Mar 09 '11 at 08:50
  • 8
    @templatetypedef `memcpy(0, 0, 0)` is most likely intended to represent a dynamic, not static invocation ... i.e., those parameter values need not be literals. – Jim Balter Mar 09 '11 at 08:53
  • @eq, @templatetypedef: They are not literal but dynamic values... in a 3rd party software :/ – Matthieu M. Mar 09 '11 at 09:12
  • @Jim Balter: Of course I have (thought about that). Whenever I use C, I try to stick to writing portable (i.e. well-defined) C, even if I have to write a few more conditionals - they are unlikely be a performance bottleneck (and I would only remove them if they were, writing them isn't that hard), and the only way to find out is .. well, finding out. Whether they will one day 'save the day' or not is, really, irrelevant. – eq- Mar 09 '11 at 15:43
  • Unlike C++, C doesn't treat null as a pointer to an array. (This is very strange.) – curiousguy Aug 09 '15 at 03:41
  • Related: https://youtu.be/I8QJLGI0GOE?t=2100 – Evg Oct 24 '22 at 20:00

3 Answers3

80

I have a draft version of the C standard (ISO/IEC 9899:1999), and it has some fun things to say about that call. For starters, it mentions (§7.21.1/2) in regards to memcpy that

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.

The reference indicated here points to this:

If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined.

So it looks like according to the C spec, calling

memcpy(0, 0, 0)

results in undefined behavior, because null pointers are considered "invalid values."

That said, I would be utterly astonished if any actual implementation of memcpy broke if you did this, since most of the intuitive implementations I can think of would do nothing at all if you said to copy zero bytes.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 1
    +1 I missed that paragraph and went straight for `memcpy`. Silly me. I should have known better than to assume that programmers would repeat that kind of information for every single function description. – Chris Lutz Mar 09 '11 at 08:27
  • 4
    I can affirm that the quoted parts from the draft standard are identical in the final document. There shouldn't be any troubles with such a call, but it would still be undefined behaviour you're relying on. So the answer to "is it guaranteed" is "no". – DevSolar Mar 09 '11 at 08:33
  • @DevSolar- Thanks for confirming this! It would have been really embarrassing if everything I said was completely wrong. :-) – templatetypedef Mar 09 '11 at 08:38
  • 10
    No implementation that you will ever use in production will produce anything other than a no-op for such a call, but implementations that do otherwise are allowed and are reasonable ... e.g., a C interpreter or augmented compiler with error checking that rejects the call because it's non-conforming. Of course that wouldn't be reasonable if the Standard did allow the call, as it does for `realloc(0, 0)`. The use cases are similar, and I've used them both (see my comment under the question). It's pointless and unfortunate that the Standard makes this UB. – Jim Balter Mar 09 '11 at 09:01
  • @Chris Many programmers might, but not Dave Prosser, who was primarily responsible for the text. – Jim Balter Mar 09 '11 at 09:08
  • 1
    @templatetypedef: Well, wrt to real word implementation which break on this, I think that SUSE 10sp3 libc.so.6 does... or so it seems. – Matthieu M. Mar 09 '11 at 09:16
  • @Matthieu M.- Cool! What does it do that breaks? – templatetypedef Mar 09 '11 at 09:22
  • @templatetypedef: it produces a memory coredump, through a call to `abort` and a nice backtrace. – Matthieu M. Mar 09 '11 at 09:29
  • @Matthieu Does the backtrace show abort being called by memcpy? A call to abort sounds like an assert failing. It's possible that the memcpy library code has such an assert, although that would be quite silly. – Jim Balter Mar 09 '11 at 09:45
  • @Jim: That was an assumption actually, the coredump is produced by a segfault, but the backtrace is nearly unexploitable (apart from the `memcpy` call) as the debug symbols have been stripped. And I don't really want to look at the assembly. – Matthieu M. Mar 09 '11 at 10:06
  • 1
    @Matthieu Ok, the segfault makes a lot more sense but is still shocking. I don't know the x86 hardware well, but apparently the movsb instruction is accessing the addressed memory even though the count is 0. – Jim Balter Mar 09 '11 at 10:15
  • 7
    "I would be utterly astonished if any actual implementation of memcpy broke if you did this" - I've used one that would; in fact if you passed length 0 with valid pointers, it actually copied 65536 bytes. (Its loop decremented the length and then tested). – M.M Jul 12 '14 at 06:01
  • 18
    @MattMcNabb That implementation is broken. – Jim Balter Jul 13 '14 at 22:16
  • 3
    @MattMcNabb: Add "correct" to "actual", maybe. I think we all have not-so-fond memories of old, ghetto C libraries and I'm not sure how many of us appreciate those memories being recalled. :) – tmyklebu Sep 09 '14 at 11:20
  • 2
    Does one-past-the-end of an array count as a valid value? – M.M Aug 30 '15 at 05:34
  • memcpy can ask the CPU to prefetch the source memory, before doing any checks on the size. – Martin C. Martin Oct 24 '18 at 03:20
  • 1
    There doesn't appear to be a clear definition of *invalid value* in the C11 standard. I don't think the parenthesized example listing in 7.1.4 implies that a null pointer is an invalid value in all contexts (note 102's expression of "invalid values for dereferencing" further reinforces the notion that invalid values are contextual). Furthermore, the standard mentions "shall not be a null pointer" a great many times but `memcpy` doesn't come with it. Must be because nullpointers should be valid arguments to `memcpy` when the size argument is zero. – Petr Skocik Aug 14 '19 at 10:58
  • @PSkocik: The authors of the Standard make little effort to consider situations where all implementations would be expected to behave a certain way absent a clear and compelling reason to do otherwise, and few if any implementations would be have any compelling reason to do otherwise. The only situation where such consideration would matter would be if some implementations had a compelling reason to do otherwise, and if the authors of the Standard don't know of any implementations that do anything unusual, they would have no way of judging whether their reasons for doing so were compelling. – supercat Jun 14 '20 at 18:59
  • @PSkocik: IMHO, the authors of the Standard didn't want to require programmers who were writing functions that receive pointer+size pairs on conventional target platforms to add explicit logic to prevent zero-byte reads or writes to/from null. On the other hand, they also likely didn't want to require that compilers targeting obscure platforms add extra code if their natural means of performing memcpy operations would otherwise have unwanted side effects. Most likely, they figured that the question was sufficiently unlikely to matter that there was no need to discuss it. – supercat Jun 14 '20 at 19:07
  • This answer makes no sense. The pointers have to be passed a valid value as opposed to nonsense because on certain non-flat architectures you will observe a fault as soon as you load the pointer in to the pointer register. NULL does not fault though so that's not the issue it's talking about. – Joshua Oct 10 '22 at 16:31
27

Just for fun, the release-notes for gcc-4.9 indicate that its optimizer makes use of these rules, and for example can remove the conditional in

int copy (int* dest, int* src, size_t nbytes) {
    memmove (dest, src, nbytes);
    if (src != NULL)
        return *src;
    return 0;
}

which then gives unexpected results when copy(0,0,0) is called (see https://gcc.gnu.org/gcc-4.9/porting_to.html).

I am somewhat ambivalent about the gcc-4.9 behaviour; the behaviour might be standards compliant, but being able to call memmove(0,0,0) is sometimes a useful extension to those standards.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
user1998586
  • 762
  • 7
  • 13
  • 2
    Interesting. I understand your ambivalence but this is the heart of optimizations in C: the compiler *assumes* that developers follow certain rules and thus *deduces* that some optimizations are valid (which they are if the rules are followed). – Matthieu M. Jul 19 '14 at 12:45
  • @MatthieuM.: Yes, but this is particularly stupid. A rough corner was added to the specification of `memcpy` that doesn't exist in any implementation I know of, not to increase uniformity, but to deliberately break from it. – tmyklebu Sep 09 '14 at 11:19
  • @tmyklebu: It would be reasonable on some platforms to implement `memcpy` as something like `char *end=src+count;while(src – supercat Dec 06 '14 at 19:03
  • @supercat: I do not see where the UB would be triggered if `count == 0`. – tmyklebu Dec 06 '14 at 21:12
  • 2
    @tmyklebu: Given `char *p = 0; int i=something;`, evaluation of the expression `(p+i)` will yield Undefined Behavior even when `i` is zero. – supercat Dec 06 '14 at 21:32
  • @supercat: That's bizarre. (And, in my opinion, stupid.) But 6.5.6 paragraph 8 does indeed imply that adding 0 to the null pointer is UB. – tmyklebu Dec 06 '14 at 23:24
  • @tmyklebu: Having implementations trap at attempts to add a non-zero value to a null pointer would generally be a good thing if it could be done cheaply; having such a trap *not* occur when adding zero would add to the cost. – supercat Dec 10 '14 at 05:04
  • @supercat: I would probably find such an implementation bizarre and stupid. Remember Clippy from MS Word? Remember how he'd get up in your face when you were minding your own business trying to be productive? Wasn't that annoying? That's basically how I feel about surprising sharp corners in language specs. – tmyklebu Dec 10 '14 at 05:15
  • 1
    @tmyklebu: Having all pointer arithmetic (other than comparisons) on a null pointer trap would IMHO be a good thing; whether `memcpy()` should be allowed to perform any pointer arithmetic on its arguments prior to ensuring a non-zero count is another question [if I were designing the standards, I would probably specify that if `p` is null, `p+0` could trap, but `memcpy(p,p,0)` would do nothing]. A much bigger problem, IMHO, is the open-endedness of most Undefined Behavior. While there are some things which really *should* represent Undefined Behavior (e.g. calling `free(p)`... – supercat Dec 10 '14 at 16:43
  • 1
    ...and subsequently performing `p[0]=1;`) there are a lot of things which should be specified as yielding indeterminate result (e.g. a relational comparison between unrelated pointers should not be specified as being consistent with any other comparison, but should be specified as yielding either a 0 or a 1), or should be specified as yielding a behavior slightly looser than implementation-defined (compilers should be required to document all possible consequences of e.g. integer overflow, but not specify which consequence would occur in any particular case). – supercat Dec 10 '14 at 16:52
  • @supercat: I agree with you on every one of your points except having arithmetic on null pointers trap. I just don't like the idea that *constructing* an invalid pointer, or even dereferencing it to product an lvalue that doesn't refer to an object, is bad. (Lvalue-to-rvalue conversion when the lvalue doesn't refer to an object is bad, sure, but I don't see why we need to go beyond that at all.) – tmyklebu Dec 10 '14 at 17:00
  • 1
    @tmyklebu: I particularly dislike the notion that Undefined Behavior should be usable as a form of compiler-exploitable assertion exempt from normal the rules of causality--*that* to me is what represents "Clippy"-level annoyance. Given `uint16_t foo=f(); int bar=0; if (foo > 50000) bar=3; if (foo*foo > 16383) bar |= 1;`, what good can come from allowing 32-bit compiler to omit the first `if`? – supercat Dec 10 '14 at 17:03
  • @tmyklebu: Having arithmetic on null-pointers trap would in many cases be cheaper than ensuring that such arithmetic will not yield something that could be mistaken for a valid pointer. Given `char *p; int flag;`, what should `char *q = p+123456789; int32_t d=q-p; if (flag) *q = 0;` do if `p` is null and flag is zero? If non-zero? If the assignment to `q` asserts that p is non-null, then `d` can be assigned the literal 123456789, but there may not exist any trap pointer representation from which one could subtract a null pointer to yield 123456789. – supercat Dec 10 '14 at 17:18
  • @tmyklebu: Given a choice between having the assignment to `q` trap, or overwriting memory location 123456789, I'd much prefer the former. Having arithmetic on a null-pointer value yield a null result might be an acceptable alternate behavior, but that would make the value of `d` dependent upon the value of `p`. – supercat Dec 10 '14 at 17:20
  • 1
    @supercat: I don't understand why you care whether `NULL + foo` is a valid pointer to such an extent that you want to ensure that it never is. Messing with null pointer arithmetic in the way you describe would ruin the identity `p + (q-p) == q` that I'd want to hold in any hypothetical safe dialect of C. – tmyklebu Dec 10 '14 at 17:57
  • @tmyklebu: Nasal demons are bad. If `*q=0;` can be executed without trapping when `q` holds an invalid pointer, it will cause nasal demons. I would posit that the value of preventing `*q=0;` from causing nasal demons whenever possible outweighs the value of being able to perform arithmetic on null pointers without it trapping. It's virtually impossible for a practical C-standard-compliant platform to guard against absolutely all conceivable nasal demons, but stores to null-derived invalid pointers are both common and easy to check for. – supercat Dec 10 '14 at 18:42
  • @tmyklebu: The only "use" I know of for arithmetic on null pointers is to do things which should really be done using standard macros like `offsetof`. Can you suggest any others? – supercat Dec 10 '14 at 18:49
  • @supercat: `offsetof` is one legitimate use. But I do not think that declaring things illegal in the absence of an argument for why they should be legal is sound language design policy. – tmyklebu Dec 10 '14 at 18:52
  • 1
    @supercat: I've seen "legitimate" uses of two-past-the-end and one-before-the-beginning pointers. For instance, if you want to advance through a string by doing `*++str`, then you want to start with a one-before-the-beginning pointer. `mmap` likes to use `(void *)-1` as an error return value. It just doesn't seem sensible to forbid computation of invalid pointers when they're never used to access memory. – tmyklebu Dec 10 '14 at 19:04
  • @tmyklebu: User code should not care how `offsetof` is implemented; an implementation where subtracting null from the address of a field of a null object happens to yield that field's offset may implement `offsetof` with such an expression, but an implementation would be free to implement it other ways. Although many implementations of C store pointers as linear memory addresses, nothing in the standard requires any such thing, and tricks such as you describe may fail on architectures which store pointers other ways [e.g. a system might store pointers using a pair of numbers, such that... – supercat Dec 10 '14 at 19:09
  • ...a statement like `p->q=123` is internally implemented as something analogous to `memory_regions[p_base][p_ofs+offsetof(p.q)] = 123;`; such a design would make it possible ensure that something like `char *p=malloc(100); p[200]=0;` will trap rather than overwriting some arbitrary object, and will allow the compiler to relocate malloc'ed memory regions if free space becomes fragmented, but may break if code tries to manipulate pointers in ways that assume a linear address space. – supercat Dec 10 '14 at 19:13
  • @supercat: I agree that writing to or reading from unallocated storage is and should be UB. I have no problem with simply computing an invalid address. I think subtraction of unrelated pointers should be implementation-defined, or at least "undefined but with a strong encouragement to implementations to define sensibly," and I see no reason why computing invalid pointers should trap. I don't buy your argument that the current wording allows implementations to reorganise storage, and, even if true, I don't see that as a compelling advantage. – tmyklebu Dec 10 '14 at 19:18
  • 1
    @tmyklebu: Little is gained from allowing computations on invalid pointers to yield other pointers without being trapped; the only cases where such things would have any realistic hope of working are system-specific, and could be done in implementation-defined (rather than undefined) fashion by by casting to `intptr_t`, performing the appropriate computations on the resulting value, and casting back. I would see great advantage to an extension to C which would specify p-q in three situations where it presently does not... – supercat Apr 24 '15 at 16:01
  • 1
    ...specifically saying that if p and q are valid and p is not contained within the first s bytes of object q, then p-q may be any value *not* in the range 0..s-1, saying that if p and q were part of the same formerly-valid object, p-q will have the same valid as when the object was valid, and saying that given `p=realloc(q);` p-q will hold an arbitrary non-zero value if the object was moved. – supercat Apr 24 '15 at 16:04
  • 1
    @tmyklebu: The latter could be replaced by a rule allowing `==` to be used (which should also be valid) but all three represent useful operations which are essentially impossible in purely-standard C. People trying to turn C into something it never was may howl at the idea of extending the set of formally-defined behaviors to include things that 99% of compilers supported anyway, but I hope those who want the language to be useful can get them anyway. – supercat Apr 24 '15 at 16:09
  • 8
    someone please tell me, why don't I get a stackoverflow badge "started a flame war" :-) – user1998586 Jun 20 '15 at 08:28
0

You can also consider this usage of memmove seen in Git 2.14.x (Q3 2017)

See commit 168e635 (16 Jul 2017), and commit 1773664, commit f331ab9, commit 5783980 (15 Jul 2017) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit 32f9025, 11 Aug 2017)

It uses an helper macro MOVE_ARRAY which calculates the size based on the specified number of elements for us and supports NULL pointers when that number is zero.
Raw memmove(3) calls with NULL can cause the compiler to (over-eagerly) optimize out later NULL checks.

MOVE_ARRAY adds a safe and convenient helper for moving potentially overlapping ranges of array entries.
It infers the element size, multiplies automatically and safely to get the size in bytes, does a basic type safety check by comparing element sizes and unlike memmove(3) it supports NULL pointers iff 0 elements are to be moved.

#define MOVE_ARRAY(dst, src, n) move_array((dst), (src), (n), sizeof(*(dst)) + \
    BUILD_ASSERT_OR_ZERO(sizeof(*(dst)) == sizeof(*(src))))
static inline void move_array(void *dst, const void *src, size_t n, size_t size)
{
    if (n)
        memmove(dst, src, st_mult(size, n));
}

Examples:

- memmove(dst, src, (n) * sizeof(*dst));
+ MOVE_ARRAY(dst, src, n);

It uses the macro BUILD_ASSERT_OR_ZERO which asserts a build-time dependency, as an expression (with @cond being the compile-time condition which must be true).
The compilation will fail if the condition isn't true, or can't be evaluated by the compiler.

#define BUILD_ASSERT_OR_ZERO(cond) \
(sizeof(char [1 - 2*!(cond)]) - 1)

Example:

#define foo_to_char(foo)                \
     ((char *)(foo)                     \
      + BUILD_ASSERT_OR_ZERO(offsetof(struct foo, string) == 0))
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 4
    The existence of optimizers that think "clever" and "dumb" are antonyms makes the test for n necessary, but more efficient code would generally be possible on an implementation that guaranteed that memmove(any,any,0) would be a no-op. Unless a compiler can replace a call to memmove() with a call to memmoveAtLeastOneByte(), the workaround to guard against clever/stupid compilers' "optimization" will generally result in an extra comparison a compiler won't be able to eliminate. – supercat Aug 14 '17 at 23:34