Safe use of memcpy on overlapping region

Question

Is it safe to use memcpy in the following scenario, where one is copying data from larger index into a block to smaller index in the same block. For example:

char buf[100];
// fill in the data ...
memcpy(&buf[10], &buf[15], 10);

In the above scenario I don't care about data from location 10 - 19 and am fine if its overwritten. Is there some reason why this should be avoided and memmove used instead?

EDIT: Sorry I have not communicated my intention properly, so lets say I have data from index 10 - 19 and data from index 15 - 24, I want to copy data from 15 - 24 over 10 - 19, and I don't care about data from 10 - 19, is it safe to us memcpy even though they are overlapping?

it doesn't overlap - `memcpy(&buf[20], &buf[10], 20);` would — user3125280, Jan 18 '14 at 13:45
If A and B don't overlap, then B and A also don't overlap... — Oliver Charlesworth, Jan 18 '14 at 13:54
If there is overlap, memcpy()s behaviour is undefined. For memmove() the correct behaviour is guaranteed. — wildplasser, Jan 18 '14 at 14:13
@ wildplasser is it because the order in which each byte is copied is not defined? — Abhas Saroha, Jan 18 '14 at 14:16
@user689046: Yes - writes to the destination buffer may modify "needed but not read yet" data in the source buffer if source and destination overlap — Brendan, Jan 18 '14 at 14:25

rullof · Answer 1 · 2014-01-18T13:55:26.750

1

As you specified in the statment:

memcpy(&buf[20], &buf[10], 10);

The data from the index 10 to 19 doesnt overlap with the data from the index 20 to 29 so it's secure to use memcpy() even if you care about the data from the index 10 to 19.

Note that if the data overlaps even if you don't care about the data you're copying it's not safe to use memcpy since the direction in which memcpy is copying is not specified

edited Jan 18 '14 at 13:55

answered Jan 18 '14 at 13:49

rullof

7,124
6
27
36

sorry i mixed up the destination and src, fixed it above! – Abhas Saroha Jan 18 '14 at 13:53
thanks the ordering of copy not being defined clears it up for me. – Abhas Saroha Jan 18 '14 at 14:28

Shahbaz · Accepted Answer · 2014-01-18T14:07:10.673

1

Edit:

Based on your edit, invert the following answer, since now you don't agree with the restrict constraints.

Old answer

Yes it is safe. You are copying buf[10] through buf[19] on buf[20] through buf[29]. (Note that the first parameter of memcpy is destination. So buf[10] through buf[19] are not being overwritten.)

memcpy is defined in C11 as:

void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

notice the restrict keyword. C11 at 6.7.3.8 says (emphasis mine):

An object that is accessed through a restrict-qualified pointer has a special association with that pointer. This association, defined in 6.7.3.1 below, requires that all accesses to that object use, directly or indirectly, the value of that particular pointer.135) The intended use of the restrict qualifier (like the register storage class) is to promote optimization, and deleting all instances of the qualifier from all preprocessing translation units composing a conforming program does not change its meaning (i.e., observable behavior).

In your example, buf[10] through buf[19] are accessed only through the s2 pointer and buf[20] through buf[29] are accessed only through the s1 pointer. Therefore, your usage of memcpy is perfectly ok.

In simpler terms, as long as the arrays you give to memcpy don't overlap, it's ok.

edited Jan 18 '14 at 14:07

answered Jan 18 '14 at 13:50

Shahbaz

46,337
19
116
182

Sorry see the edit above, is it safe if i don't care about earlier data in overlapping regions and am overwriting it with later data? – Abhas Saroha Jan 18 '14 at 14:07
In case of overlapping regions is it not safe because there is no guarantee the order in which data will be copied over? In case index j (j > i && j < i + data_length) is copied over to i before j + 1 to i + 1 this should be perfectly safe. – Abhas Saroha Jan 18 '14 at 14:15
@user689046, there is no guarantee which direction the data would be copied from, so you may get completely corrupt data. That's why `memmove` exists. – Shahbaz Jan 18 '14 at 14:33
Thank you that clarifies things for me. – Abhas Saroha Jan 18 '14 at 14:40
Would a read of a value using one restrict pointer followed by a write of that value using another invoke UB if there was no possible way the write could precede the read [e.g. the code `*p=*q;` with p and q equal]? There are cases where having `memcpy(p,p,n);` defined as copying any or all of the values to themselves in arbitrary order would allow code to be written more efficiently than would treating it as Undefined Behavior. – supercat Oct 31 '15 at 19:32
@supercat, yes that would be UB. That's just `restrict` and regardless of whether it's always a good idea or not, it's what it requires. I'm curious to know in what situation you may even want `memcpy(p, p, n)`! – Shahbaz Nov 01 '15 at 13:53
@Shahbaz: Some sorting, shuffling, or other permutation-related algorithms can be written most efficiently if they're allowed to swap items with themselves. If things to be copied are small (sometimes aliasing rules force the use of memcpy with 4-byte types) and only a tiny fraction of swaps are reflexive, it may be more efficient to blindly swap a few items with themselves than slow down the more common case to avoid such self-swaps. Given that the fundamental purpose of "restrict" was to allow pointers to be read eagerly and written lazily, I think it's unfortunate... – supercat Nov 01 '15 at 17:03
...that the Standard didn't specify behavior in cases where either (1) data being written matched what was already present, or (2) a data dependency between reads and writes would ensure that all reads by one pointer would precede any writes by the other. Ironically, I would think that in 1999 if someone had suggested that, I think the idea might have been rejected not on the grounds that it would impede optimization, but rather on the basis that there was no need to overcomplicate the spec to prevent compilers from doing something they'd have no reason to do anyway. – supercat Nov 01 '15 at 17:06
@supercat, assuming that in your examples it really does make a difference to be able to swap few and small elements of an array, it still isn't convincing to me that `restrict` should contain such special cases. In your examples, you could use `memmove` instead of `memcpy` by the way. Nevertheless, `restrict` has an important role for the optimizer by telling it that it can safely assume what is read/written through some pointer is not going to change through any other pointer. It's not necessarily about the order of read and writes. – Shahbaz Nov 01 '15 at 23:04
@Shahbaz: Code which expects that memcpy from an address to itself should have no side-effects has existed essentially forever, and I see nothing to be gained by breaking it, given the number of ways the Standard could accommodate it. For example, independent of what "restrict" means, the Standard could be changed to say that `memcpy(p,p,n);` may read and write back any portion of the range, in any order, using the source pointer, and may read and write back any disjoint portion of the range, in any order, using the destination pointer. Such behavior would make code such as I described... – supercat Nov 01 '15 at 23:14
...work, but would not limit any optimizations *except* those which would erroneously assume that source and destination cannot be equal. If an otherwise-useful optimization opportunity would be foregone, a programmer wishing to enable it could use a suitable compiler intrinsic to invite the compiler to assume the pointers cannot be equal. Many compilers do not optimize memmove as well as memcpy (in some implementations, a permutation function using memmove may run at half the speed of one using memcpy) and using memmove is not particularly attractive. – supercat Nov 01 '15 at 23:17
@supercat, why would the standard C that `memcpy(p, p, n)` can copy in any order. Wouldn't it make more sense to say `memcpy(p, p, n)` doesn't do anything at all? – Shahbaz Nov 01 '15 at 23:52
@Shahbaz: Saying it does nothing at all would imply that calling memcpy(p,p,n) in cases where reading and writing the same value to p could have adverse observable side-effects would not cause such side-effects to occur. Saying it may read and write back any portion of the source range, and any portion of the destination range, would allow for the possibility of it doing nothing, but would not require the compiler or library to do anything other than the same sequence of operations that would have happened absent the "restrict" qualifier. BTW, what forms of useful optimization... – supercat Nov 02 '15 at 02:23
...do you see `restrict` as having been designed to facilitate other than the ability to perform reads eagerly and writes lazily? Eager reads and lazy writes are useful, but requiring that self-memcpy work would in no way impede those since a write could not possibly occur before the latest point that a read could occur. BTW, an intrinsic which would actually be surprisingly useful in C would be something that behaved like memmove(p,p,n) but wasn't expected to generate any code but merely behave for type-aliasing purposes as though the memmove had happened, since... – supercat Nov 02 '15 at 02:27
...if such an intrinsic existed, a lot of code that would otherwise use silly little memcpy operations could use that intrinsic and then use pointer-based type punning in the way that was common before C89 effectively outlawed it. – supercat Nov 02 '15 at 02:28
_Re: calling memcpy(p,p,n) in cases where reading and writing the same value to p could have adverse observable side-effects would not cause such side-effects to occur_, I'm pretty sure if you write `*p = *p` it gets optimized out. The only way to prevent that is to use the `volatile` qualifier. You can't pass `volatile` pointers to `mempy`, so this point is mute. – Shahbaz Nov 02 '15 at 15:37
_Re: hat forms of useful optimization do you see ..._, Imagine this: `*p = x; *q = y; *(p+1) = *p + 1;` (Note that, that's not necessarily the actual code, but a sequence of actions, perhaps for example taken out of a loop). If `p` is `restrict`, the compiler can increment the register that was holding `x` and write it to `*(p+1)`. If it's not `restrict`, then it has to read back `*p` because `*q = y` might have changed `*p`. – Shahbaz Nov 02 '15 at 15:40
_Re: an intrinsic which would actually be surprisingly useful in C would be something that behaved like `memmove(p,p,n)` but wasn't expected to generate any code but merely behave for type-aliasing purposes as though the `memmove` had happened_. I'm not sure I understood this, nor what _silly little `memcpy` operations_ it prevents nor what was common before C89. – Shahbaz Nov 02 '15 at 16:16
@Shahbaz: Before C89, if a number of structure types (possibly in different compilation units) had some initial members in common, a method which accepted an array of pointers to one such type and only accessed the aforementioned initial members could safely be given a suitably-typecast array of pointers to any such type. An (IMHO) overly-broad rule in C89 states that a program which uses type foo** to read memory written as type bar** invokes Undefined Behavior, even if all applicable storage layouts would be guaranteed compatible. The only portable workaround is to... – supercat Nov 02 '15 at 16:28
...use memcpy on the pointer. Historically, few programmers would have bothered with memcpy for such things because compilers would simply have translated the code in simple straightforward fashion. Some compilers today, however, take the view that compilers should go out of their way to remove code which only handles scenarios where the Standard imposes no requirements. Thus, using type foo** to read something that may have been written as some other possibly-unknown-but-layout-compatible pointer type is no longer safe and one must use silly little 4-byte memcpy/memmove operations instead. – supercat Nov 02 '15 at 16:32
An alternative to using memcpy/memmove to copy the pointer to a local variable would be to use memmove to copy the pointer to itself, which would force it to behave as though it was written using "unsigned char*" which is exempt from the normal aliasing rules, but for that scenario it shouldn't be necessary for the memmove to physically do anything--all it would need to do is prevent the compiler from making any inferences based on the types used to read and write the memory. – supercat Nov 02 '15 at 16:36

user3125280 · Answer 3 · 2014-01-18T13:55:52.157

0

It is safe. There is no reason to use memmove.

memcpy doens't specify a direction so memcpy(&buf[20], &buf[10], 20); would be confusing. we have to make surethe copy starts at &buf[20]. memmove and std::copy do make such guarantees, so they could be safely used in such a case.

memcpy(&buf[10], &buf[20], 10); doens't overlap because &buf[10] + 9 == &buf[19] is the make address copied to and is less than &buf[20].

edited Jan 18 '14 at 13:55

answered Jan 18 '14 at 13:48

user3125280

2,779
1
14
23

You said that it's not secure then you said that the data doesn't overlap. your answer is confusing! – rullof Jan 18 '14 at 13:52
@EricPostpischil um ok - i was answering the one with a question mark. Easy fixed. – user3125280 Jan 18 '14 at 13:53

Safe use of memcpy on overlapping region

3 Answers3