6

For this example, I am working with objective-c, but answers from the broader C/C++ community are welcome.

@interface BSWidget : NSObject {
    float tre[3];
}
@property(assign) float* tre;

.

- (void)assignToTre:(float*)triplet {
    tre[0] = triplet[0];
    tre[1] = triplet[1];
    tre[2] = triplet[2];
}

.

- (void)copyToTre:(float*)triplet {
    memcpy(tre, triplet, sizeof(tre) );
}

So between these two approaches, and considering the fact that these setter functions will only generally handle dimensions of 2,3, or 4...

What would be the most efficient approach for this situation?

Will gcc generally reduce these to the same basic operations?

Thanks.

bitcruncher
  • 800
  • 1
  • 7
  • 14

3 Answers3

7

A quick test seems to show that the compiler, when optimising, replaces the memcpy call with the instructions to perform the assignment.

Disassemble the following code, when compiled unoptimised and with -O2, shows that in the optimised case the testMemcpy function does not contain a call to memcpy.

struct test src = { .a=1, .b='x' };

void testMemcpy(void)
{
  struct test *dest = malloc(sizeof(struct test));
  memcpy(dest, &src, sizeof(struct test));
}

void testAssign(void)
{
  struct test *dest = malloc(sizeof(struct test));
  *dest = src;
}

Unoptimised testMemcpy, with a memcpy call as expected

(gdb) disassemble testMemcpy 
Dump of assembler code for function testMemcpy:
   0x08048414 <+0>: push   %ebp
   0x08048415 <+1>: mov    %esp,%ebp
   0x08048417 <+3>: sub    $0x28,%esp
   0x0804841a <+6>: movl   $0x8,(%esp)
   0x08048421 <+13>:    call   0x8048350 <malloc@plt>
   0x08048426 <+18>:    mov    %eax,-0xc(%ebp)
   0x08048429 <+21>:    movl   $0x8,0x8(%esp)
   0x08048431 <+29>:    movl   $0x804a018,0x4(%esp)
   0x08048439 <+37>:    mov    -0xc(%ebp),%eax
   0x0804843c <+40>:    mov    %eax,(%esp)
   0x0804843f <+43>:    call   0x8048340 <memcpy@plt>
   0x08048444 <+48>:    leave  
   0x08048445 <+49>:    ret 

Optimised testAssign

(gdb) disassemble testAssign 
Dump of assembler code for function testAssign:
   0x080483f0 <+0>: push   %ebp
   0x080483f1 <+1>: mov    %esp,%ebp
   0x080483f3 <+3>: sub    $0x18,%esp
   0x080483f6 <+6>: movl   $0x8,(%esp)
   0x080483fd <+13>:    call   0x804831c <malloc@plt>
   0x08048402 <+18>:    mov    0x804a014,%edx
   0x08048408 <+24>:    mov    0x804a018,%ecx
   0x0804840e <+30>:    mov    %edx,(%eax)
   0x08048410 <+32>:    mov    %ecx,0x4(%eax)
   0x08048413 <+35>:    leave  
   0x08048414 <+36>:    ret   

Optimised testMemcpy does not contain a memcpy call

(gdb) disassemble testMemcpy 
Dump of assembler code for function testMemcpy:
   0x08048420 <+0>: push   %ebp
   0x08048421 <+1>: mov    %esp,%ebp
   0x08048423 <+3>: sub    $0x18,%esp
   0x08048426 <+6>: movl   $0x8,(%esp)
   0x0804842d <+13>:    call   0x804831c <malloc@plt>
   0x08048432 <+18>:    mov    0x804a014,%edx
   0x08048438 <+24>:    mov    0x804a018,%ecx
   0x0804843e <+30>:    mov    %edx,(%eax)
   0x08048440 <+32>:    mov    %ecx,0x4(%eax)
   0x08048443 <+35>:    leave  
   0x08048444 <+36>:    ret    
  • aha! so gcc may convert the memcpy statetments to assignments if optimizations are elected and it sees fit to do so. most interesting. – bitcruncher Aug 18 '11 at 20:47
  • @bitcruncher- Note that this sample code is `memcpy`-ing a single instance of a struct and not an array. I recommend that you follow the same test procedure with your array-based code since the compiler may not behave exactly the same way as the `struct`-based test case. – bta Aug 19 '11 at 17:48
  • It's worth noting that if the copy were large enough, many compilers would also perform the opposite conversion: replacing the individual element copies with a single call to `memcpy`. – Stephen Canon Aug 19 '11 at 17:55
2

Speaking from a C background, I recommend using direct assignment. That version of the code is more obvious as to your intent, and less error-prone if your array changes in the future and adds extra indices that your function doesn't need to copy.

The two are not strictly equivalent. memcpy is typically implemented as a loop that copies the data in fixed-size chunks (that may be smaller than a float), so the compiler probably won't generate the same code for the memcpy case. The only way to know for sure is to build it both ways and look at the emitted assembly in a debugger.

Even if the memcpy call is inlined, it will probably result in more code and slower execution time. The direct assignment case should be more efficient (unless your target platform requires special code to handle float datatypes). This is only an educated guess, however; the only way to know for sure is to try it both ways and profile the code.

bta
  • 43,959
  • 6
  • 69
  • 99
  • 2
    Actually, `memcpy` is usually implemented to move word-sized chunks at a time, not bytes at a time, with fixups for the ends if the size isn't a multiple of the word size (and special cases for non-word-aligned buffers). And for calls to `memcpy` where the copy size is known at compile time, GCC will often optimize that and replace it with faster inline assembly if it can, unless the `-fno-builtin-memcpy` compiler option is used. – Adam Rosenfield Aug 18 '11 at 19:56
  • 1
    I think direct assignment should be the first choice. It clearly expresses the programmer's intent and optimizers can generate surprisingly good code. If profiling later shows this is a hotspot then look for alternatives. – Blastfurnace Aug 18 '11 at 20:04
  • @Adam Rosenfield- Good point, some implementations indeed use word-sized chunks. I'll edit the answer to make it clearer. – bta Aug 19 '11 at 17:42
  • Actually, on both OS X and iOS (which the questioner seems to be targeting), `memcpy` uses larger-than-word-size chunks for buffers large enough to allow it. Also, compilers for these systems recognize the idioms of array and structure copies, whether they are performed via memcpy, a for loop, or an unrolled element copy, and will codegen any of them to whatever is fastest for the buffer size in question. – Stephen Canon Aug 19 '11 at 17:50
-1

memcpy:

  1. Do function prolog.
  2. Initialize counter and pointers.
  3. Check if have bytes to copy.
  4. Copy memory.
  5. Increment pointer.
  6. Increment pointer.
  7. Increment counter.
  8. Repeat 3-7 3 or 11 more times.
  9. Do function epilog.

Direct assignment:

  1. Copy memory.
  2. Copy memory.
  3. Copy memory.

As you see, direct assignment is much faster.

Triang3l
  • 1,230
  • 9
  • 29