C++ memcpy function, why use *s++ instead of s[i]

Question

I am writing memcpy from scratch and I have been looking up other peoples implementations...My implementation is:

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++)
            D[i] = S[i];
    return D;
}

various other sources and references that I have researched have

void* memcpy (void *destination, const void *source, size_t num)
{
    char *D = (char*)destination;
    char *S = (char*)source;
    for(int i = 0; i < num; i++) 
    {
            *D = *S;
            D++;
            S++;
    }
    return D;
}

I am having trouble understanding the difference and whether they would produce different outputs. The portion that confuses me specifically is the D++; and S++;

Random note#2: the second version returns a different pointer value. — wildplasser, Feb 15 '12 at 23:21
Random note#3: you will never achieve better performance with a custom `memcpy`. Well, at least not in 99% of cases, maybe more. — Irfy, Feb 15 '12 at 23:25
Possible duplicate of [this](http://stackoverflow.com/q/1663094/123376) question — Rado, Feb 15 '12 at 23:26
Random note#4: in the second snippet you don't need the i. Just (while num--) *D++ = *S++; return destination; — wildplasser, Feb 15 '12 at 23:27
Reason for implementing memcpy outside of the lib is for an assignment in an intro to programming c++ course. — Scott Curtis, Feb 16 '12 at 07:33

score 2 · Answer 1 · answered Feb 15 '12 at 23:22

2

Modern compilers will optimize these to the same code. It is called strength reduction. (Except for the different return values.)

answered Feb 15 '12 at 23:22

rasmus

3,136
17
22

Have you actually tested this? I've made a piece of code a lot more efficient by simply using pointers instead of accesing them using indexes. – mfontanini Feb 15 '12 at 23:25
I have not tested it on gcc if that is what you mean. I have however implemented this exact optimization in a compiler. It is very well known. – rasmus Feb 15 '12 at 23:27
If both functions were modified not to return anything, most compilers would produce identical (or effectively identical) assembly code for the two functions. (This code is trivial for the compiler to "see through".) – David Schwartz Feb 15 '12 at 23:27

Krizz · Answer 2 · 2012-02-15T23:28:16.710

1

D++ and S++ is incrementing a pointer.

Keep in mind that D[i] is equivalent to *(D + i).

Thus one is incrementing the pointer, other is keeping base and adding offset.

Modern compilers will probably compile to the same code.

NB: I assume return D; in second example is a copy-paste error, as it should be return destination; because D is increment and point to the memory "after" destination bytes.

edited Feb 15 '12 at 23:28

answered Feb 15 '12 at 23:22

Krizz

11,362
1
30
43

1

No, it's not, it would be D+i*sizeof(char). If the compiler does not optimize the multiplication would be much slower than the addition in pointer increment – Rado Feb 15 '12 at 23:24
2

@Rado Since `sizeof(char)` is always 1, there is no difference. The `sizeof` operator returns sizes in characters. – David Schwartz Feb 15 '12 at 23:26
1

@Rado actually it would be `D + i` for two reasons: 1, C++ does pointer arithmetic, and 2, `sizeof(char) == 1` always, so it would be `D + i` even if C++ didn't do pointer arithmetic. But he did forget the `*` – Seth Carnegie Feb 15 '12 at 23:26
2

@Rado: pointer arithmetic in C++ already includes the `sizeof(T)` multiplication for the type of element `T` in the array. – André Caron Feb 15 '12 at 23:28
@Rado, it is not `D+i*sizeof(char)`. Indeed, I forgot `*` which I've just added. Thanks for pointing out. – Krizz Feb 15 '12 at 23:29
@Krizz actually it is `*(D + i * sizeof(char))` and it is also `*(D + i)`, because you can multiply something by `1` and it will be itself (though that formula may be wrong for types other than `char` (specifically, types where `sizeof(T) != 1`)) – Seth Carnegie Feb 15 '12 at 23:30
@SethCarnegie, Sorry, I meant that the multiplication would be going on behind the scenes, not that you have to multiply it in code. And yes, although sizeof char is one, if the type is later changed to wchar or something else, you wouldn't want to be changing the algorithm as well so might as well be consistent to begin with – Rado Feb 15 '12 at 23:34
@SethCarnegie, yep, in this case of `char` (and other one-byte types), it is true. – Krizz Feb 15 '12 at 23:34
@Rado, but it will be a case in `D++` too. Behind the scenes it is `D += sizeof(char)`. – Krizz Feb 15 '12 at 23:36
@Krizz, in the other case there would be multiplication happening internally, whereas the increment is simple addition. As others have pointed out though, it doesn't matter nowadays since the compiler would most likely optimize it regardless – Rado Feb 15 '12 at 23:40
@Krizz actually `D += sizeof(char)` would be `++D` not `D++`; `D++` would be `(D += 1, D - 1)` – Seth Carnegie Feb 15 '12 at 23:43

score 0 · Answer 3 · answered Feb 15 '12 at 23:22

0

At the time this code was written, it was probably faster to avoid the addition of the index to the pointers.

In my own testing on the x86 architecture, which has an indexing mode built into the low-level instructions, the indexed method is slightly faster.

answered Feb 15 '12 at 23:22

Mark Ransom

299,747
42
398
622

Indeed, AFAIK x86 has special instructions for indexing arrays. It might be the case that the processor pipeline does more efficient caching if it has sees these instructions. Then again, it really depends on what machine instructions the compiler generates. – André Caron Feb 15 '12 at 23:25

score 0 · Answer 4 · answered Feb 15 '12 at 23:23

The algorithms are equivalent. The second version uses pointer arithmetic to advance the pointers to the next position instead of indexing the arrays using the a[i] syntax.

This works because the a[i] is actually a shorthand for *(a+i) (read: advance i positions past a and read the value at that location). Instead of performing the total offset (+i) at each iteration, it performs a partial offset (++a) at each iteration and accumulates the result.

Jason · Answer 5 · 2012-02-15T23:30:59.527

While both semantically mean the same thing, the *s++ version will avoid having to offset the initial pointer value as your increment through the bytes of the array you're copying. In other words the "underlying" representation of s[i] is actually *(s + i*sizeof(type)), and multiplications, especially of large values of i, are much slower than simple increments by small values, at least depending on the machine architecture.

In the end though, the libc implementation of memcpy will be much faster than anything you could hand-write in C due to the use of machine-dependent optimized memory-copying assembly instructions that you can't deliberately access through normal C-code.

score 0 · Answer 6 · answered Feb 15 '12 at 23:24

0

What seems to be confusing you is the pointer arithmetic: D++, S++. The pointers are being incremented to refer to the next char (as they are char*)

answered Feb 15 '12 at 23:24

David Rodríguez - dribeas

204,818
23
294
489

score 0 · Answer 7 · answered Feb 15 '12 at 23:35

Here are the inner loops, as compiled by GCC. I just added restrict keywords and removed the return value and compiled 32 bit for Core 2:

First one, array version:

.L3:
  movzbl  (%edi,%edx), %ecx
  addl    $1, %eax  
  cmpl    %ebx, %eax
  movb    %cl, (%esi,%edx)
  movl    %eax, %edx
  jne .L3

Second one, increment version:

.L9:
  movzbl  (%edx), %ebx
  addl    $1, %ecx   
  addl    $1, %edx   
  movb    %bl, (%eax)
  addl    $1, %eax  
  cmpl    %ecx, %esi
  ja  .L9

The compiler, as you can see, saw right through both constructs.

score 0 · Answer 8 · answered Feb 16 '12 at 01:07

Though it makes little difference either way, I'd be a bit surprised to see either of those. I'd expect something closer to:

void* memcpy (void *destination, const void *source, size_t num) {
    char *S = (char *)source;
    char *D = (char *)destination;
    while (--num)
        *D++ = *S++;
    return destination;
}

Most decent compilers will produce about the same code regardless. I haven't checked recently, but at one time most compilers targeting x86 would turn most of the loop into a single rep movsd instruction. They might not any more though -- that's no longer optimal.

C++ memcpy function, why use *s++ instead of s[i]

8 Answers8