2

When calling memcpy(), I try to make sure the source/dest pointers do not overlap because the specification says that is undefined behavior. For the following simple example, though, GCC generates a memcpy() with source == dest.

#include <stdio.h>
#include <string.h>

struct my_struct {
  int a[5000] ;
} ;

int main(void)
{
  struct my_struct a ;
  struct my_struct *p_a = &a ;
  memset( &a, 0, sizeof(a) ) ;

  *p_a = a ;
  printf("%d\n",p_a->a[10] );

  return 0 ;
}

Using gcc -O0, both my local GCC (5.4.0) and godbolt.org GCC x86-64 10.2 produce the following line for *p_a = a:

call memcpy

(I'm not using -O2 because that would require a more complicated example to demonstrate the problem)

I'm asking because we have an embedded system that generates a warning when memcpy() source/dest overlap. However, some people want to turn off the warning if source==dest because of the above example. What's reasonable?

Steve Friedl
  • 3,929
  • 1
  • 23
  • 30
Will
  • 423
  • 1
  • 3
  • 11
  • You presume that `p_a = &a; *p_a = a;` isn't undefined behaviour too, but I'm not so sure. [It is defined; see below] /// But let's assumes it's defined. You gotta understand that undefined behaviour simply means that the spec offers no guidance as to what the compiler should do in a particular situation. The compiler is aware of its own implementation of `memcpy` and can use it in circumstances where it knows it can be used even if the spec doesn't. – ikegami Nov 18 '20 at 17:33
  • 2
    The fact that the C standard does not define the behavior of `memcpy` when the source and destination overlap does not mean a C compiler cannot require, know, and rely on the fact that the `memcpy` it uses behaves as desired when the source and destination overlap. Of course, it is inefficient to call `memcpy` when the source and destination are identical, but that is a separate matter. – Eric Postpischil Nov 18 '20 at 17:33
  • 1
    @ikegami: There is nothing undefined about that. The evaluations of the operands of `=` are sequenced before the side effect of storing the result (C 2018 6.5.16 3). (Otherwise, you could not do `x = x+1;`.) And the C standard discusses this case explicitly: “If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined” (C 2018 6.5.16.1 3). – Eric Postpischil Nov 18 '20 at 17:36
  • 1
    Related: https://stackoverflow.com/questions/5412132/memcpy-vs-assignment-in-c-should-be-memmove – Artyer Nov 18 '20 at 17:37
  • @Eric Postpischil, Thanks. I always assume that doing weird stuff is undefined unless I know otherwise :) – ikegami Nov 18 '20 at 17:37
  • @ikegami: Interestingly, that overlap requirement seems to mean `union { int i; char c; } x = {0}; x.c = x.i;` is undefined, since `c` overlaps `i` but the overlap is not exact. – Eric Postpischil Nov 18 '20 at 17:43
  • @EricPostpischil I'm not sure if that is undefined or not, since `x.i` is converted to `char` before being stored in `x.c`, so is it still the same object? – Ian Abbott Nov 18 '20 at 18:02
  • @IanAbbott: I have to think about what that statement is actually trying to deal with. I can imagine if we had `int` members in two different structures that were in the same union, they could be overlapping but not exact (if alignment requirements permitted), and then the `x.a.i = x.b.i` assignment is the same as a `memcpy` with overlapping but not identical buffers. Is that the case it is trying to cover? – Eric Postpischil Nov 18 '20 at 18:30
  • @IanAbbott: With `x.c = x.i`, consider a C implementation that implements `unsigned char c; unsigned int i;… c = i;` by copying the low byte of `i` to `c` and then examining the high bytes to see if they are zero (fine, value was representable, all done) or not (oops, overflow—want to trap, clamp, something else?). Then `x.c = x.i;` in a big-endian implementation would break because the copying of the low byte would alter the high byte of `x.i`. – Eric Postpischil Nov 18 '20 at 18:31
  • @EricPostpischil I assume you meant signed types rather than unsigned? In that case, it could be argued that the behavior you described would not conform to the sequencing requirement of the assignment operators. – Ian Abbott Nov 19 '20 at 11:09
  • @IanAbbott: I was thinking about signed types initially but switched to unsigned to avoid discussing the sign bit, but then the conversions between unsigned types are defined such that it avoids a problem if copying the first byte clobbers another byte, so unsigned does not serve as an example. I do not think the sequencing requirement is violated by `x.c = x.i`, whether signed or unsigned, anymore than it would be violated by `y = y;`. I expect I will enter a language-lawyer question about this. – Eric Postpischil Nov 19 '20 at 12:07

1 Answers1

3

The fact that the C standard does not define the behavior of memcpy when the source and destination overlap does not mean a C compiler cannot require, know, and rely on the fact that the particular memcpy it uses behaves as desired when the source and destination overlap.

In this case, the only requirement upon memcpy (beyond that of the C standard) is that it work when the source and destination are the same. If a C compiler requires that property for the memcpy it uses, that is fine by the C standard—a C implementation may have whatever internal requirements for its software it desires. The C standard governs only the external behavior of the software, not how it works internally.

And this is a fairly mild requirement, the reason for not defining the behavior when the source and destination overlap is that various implementations of memcpy might, in the course of writing to an overlapping destination, alter parts of the source before they have been read. However, this does not occur when the source and destination are identical, at least not in normal implementations of memcpy. So it is an easy requirement to fulfill.

The memcpy generated by the compiler is inefficient, since it has no net effect, but that is of no concern since the code was generated with optimization disabled.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks - sounds like any C library used with GCC (e.g., musl, glibc, etc.) must work if source == dest? (side note: the actual memcpy() I'm facing is far away from the pointer assignment, so the compiler doesn't know source==dest even with "-O2".) – Will Nov 18 '20 at 22:06
  • @Will: It would appear so, although I do not expect that is explicitly stated anywhere in GCC documentation. – Eric Postpischil Nov 18 '20 at 22:16