8

Can some one help me to understand how memmove is implemented in C. I have only one special condition right ?

if((src<dst)&&((src+sz) > dst))

copy from the back

Also does it depend on the way stack grows ?

brett
  • 5,379
  • 12
  • 43
  • 48
  • If `src`, `dst`, and `sz` are all positive values, the condition is unsatisfiable. If `src > dst`, adding positive `sz` to it isn't going to make it any less. – Edmund Aug 26 '10 at 05:52
  • I always had a doubt. Can addresses be compared like you did? I had heard somewhere that they cannot be unless they belong to same array or structure. Someone please clarify! – aniztar Dec 25 '18 at 14:11

3 Answers3

33

Mathematically, you don't have to worry about whether they overlap at all. If src is less than dst, just copy from the end. If src is greater than dst, just copy from the beginning.

If src and dst are equal, just exit straight away.

That's because your cases are one of:

1) <-----s----->                start at end of s
                 <-----d----->

2) <-----s----->                start at end of s
            <-----d----->

3) <-----s----->                no action
   <-----d----->

4)          <-----s----->       start at beginning of s
   <-----d----->

5)               <-----s----->  start at beginning of s
   <-----d----->

Even if there's no overlap, that will still work fine, and simplify your conditions.

If you have a more efficient way to copy forwards than backwards then, yes, you should check for overlap to ensure you're using the more efficient method if possible. In other words, change option 1 above to copy from the beginning.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 1
    Note that there is another hidden assumption here, which is that you're only copying a byte at a time. – caf Aug 27 '10 at 03:18
  • 2
    Well, whether you copy a byte at a time, or a quadword or some massive SSE9 1024-bit hyperword value, the theory remains the same. You have to make sure you don't copy _into_ an overlap area that you haven't copied _out of_ yet. All the N-is-wider-than-char options introduce is a somewhat more complex detection of overlap (and final transfer) in the case where it's not a direct multiple of the value of N. – paxdiablo Aug 27 '10 at 05:22
  • @caf: If `src` and `dest` have the same alignment with respect to a larger type you could copy though, you never have to worry about clobbering an area you haven't yet copied, since the positions will always differ by at least that size. If they don't share the same alignment, you're stuck copying as bytes anyway...unless you want to make use of some nasty x86 unaligned io... – R.. GitHub STOP HELPING ICE Sep 04 '10 at 01:11
  • @R..: On many systems, at least if one is using a sane dialect of C that allows one to work around aliasing restrictions, copy operations that change alignment can still be done by reading and writing multiple words at a time if e.g. one grabs a word and extracts parts, and then on each pass through the loop reads (e.g.) four 32-bit registers, shifts those, combining with the partial word, and grabbing the bits that fell out, then stores four 32-bit registers, etc. On many ARM processors, performance even in the unaligned case can still be twice as fast as using bytes. – supercat May 18 '16 at 05:15
  • @R..: On ARM7-TDMI, for example, not counting loop overhead, copying each group of four words, with alignment-correction code, would take 20 cycles. Without alignment correction it would take 12. Copying bytes or words individually would take 5 cycles each. No need for unaligned accesses (which ARM7-TDMI doesn't support) in any case. – supercat May 18 '16 at 05:20
6

memmove can be turned into a memcpy if the two memory regions don't overlap. Obviously memcpy is extremely optimised on most systems (one of the ones I use makes use of almost every trick in the book from unrolled loops to SSE operations where supported for maximum throughput).

If the two memory regions do overlap, for all intents and purposes the region to be copied is moved into a temporary buffer and the temporary buffer is copied (all with memcpy, most likely) back on top of the original buffer. You can't work from the start or work from the back with an overlapping region, because you'll always end up with at least some data being corrupted in the process.

That being said, it's been a long time since I've looked at libc code, so there may be an optimisation for memmove and overlapping regions that I haven't thought of yet.

memmove doesn't depend on the way the stack grows at all - it merely copies one region of memory to another location - exactly like memcpy, except that it handles overlapping regions and memcpy doesn't.

EDIT: Actually, thinking about it some more... Working from the back can work if you go from the right "source" (so to speak), depending on the move itself (eg, is source < dest or not?). You can read newlib's implementation here, and tt's fairly well-commented too.

Matthew Iselin
  • 10,400
  • 4
  • 51
  • 62
2

Depends on the compiler. Good compilers will use good optimizations dependent on the target processor instruction set and bus width.

jacknad
  • 13,483
  • 40
  • 124
  • 194