3

When passing a struct in C/C++ by value the struct contents must be copied. How do compilers achieve that? I.e., which assembly instructions are usually emitted for this copy?

How fast are these, when, e.g. compared to a call to memcpy?

Now consider this code:

struct X { int i, j, k; };

void foo(X x);

void foo( int i, int j, int k);

Is there any difference between calling foo(X) and foo(int,int,int) or might the generated assembly code be the same (considering the passing of parameters)?

avp
  • 4,895
  • 4
  • 28
  • 40
gexicide
  • 38,535
  • 21
  • 92
  • 152
  • 1
    Why don't you profile it and find out? – Luchian Grigore Feb 15 '13 at 11:26
  • 1
    All compilers support options to generate assembly files instead of object files. Use that and look at the assembly yourself. Also, if the compiler actually does a copy or not can depend on optimizations. – Some programmer dude Feb 15 '13 at 11:27
  • @Luchian Grigore: Because my profiling results might only be true for my machine or my compiler. I guess somebody with a deeper knowledge is a better source here than my profiling. – gexicide Feb 15 '13 at 11:27
  • @Joachim Pileborg: Then again, this is only true for my compiler and my code. It might be possible that with another compiler totally different code is generated. Therefore, I want to know this from somebody who understands the details behind this process, not from a test that just tries it without knowing WHY these instructions are generated. – gexicide Feb 15 '13 at 11:29
  • 4
    If you want a general answer that is valid for all platforms and all compilers, you will not get any. The only thing you can do is check how it works for you, for your program, on your platform using your compiler. – Some programmer dude Feb 15 '13 at 11:31
  • @Joachim Pileborg: Of course I dont want an answer like "Every platform uses MOV". I know that I cannot get an answer like that. However, there could be an answer like "The compiler is free to do everything possible to create the copy. However, most compilers will use XXX (if the platform supports it) because it is fastest and this is equal to doing YYY", for example. – gexicide Feb 15 '13 at 11:33
  • Isn't that what you're interested in? Results for your platform/compiler? – Luchian Grigore Feb 15 '13 at 12:12
  • @Luchian Grigore: No, I am seeking more general knowledge :) – gexicide Feb 15 '13 at 12:35
  • @gexicide: Ok, so provide some compilable test cases, using clever measuring methods and distribute it ... as for example as part of you question at SO, which also includes thoughts about the results you already observed. I'm sure you could provoke a discussion such way ... ;-) – alk Feb 15 '13 at 16:26

4 Answers4

6

In C++

How do compilers achieve that?

They call the copy constructor for that class/structure. The implicitly generated one if you don't provide one or the one you provide.

How fast are these, when, e.g. compared to a call to memcpy?

Depends on the class and its members. Profiling should give you a clearer picture.
However, using memcpy to copy class instances should be avoided.

In C

How do compilers achieve that?

They perform a shallow copy for that structure. For all practical purposes you can consider it same as memcpy.

Alok Save
  • 202,538
  • 53
  • 430
  • 533
  • And in C? And in an automatically generated copy constructor in C++? – gexicide Feb 15 '13 at 11:29
  • I know that they perform a shallow copy. But how is that one implemented. – gexicide Feb 15 '13 at 11:36
  • 3
    @gexicide However the compiler sees fit in the particular case. Though, for e.g. a bunch of `int`s it would be quite stupid to use a general `memcpy`, and you can (usually) assume your compiler to not be that stupid. – Christian Rau Feb 15 '13 at 11:42
4

Obviously, if there is a constructor for the struct or class, then the constructor is called.

If there isn't a constructor, it is entirely up to the compiler, but most likely, for three integer sized objects, it will probably be three individual mov instructions. For larger structures, it's either a call to memcpy or an inlined version similar to memcpy.

It is also quite likely that if the structure is VERY large (several megabytes), that true memcpy is faster than the inlined version, and the compiler may not realize this and use the inlined version anyway. But most of us don't use megabytes large structs, so I don't think generally that's something to worry too much about. Copying structs onto the stack as arguments, if the struct is megabytes large, is probably not a great idea in the first place, given the restricted size of a typical stack.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
3

There are two distint cases.

  • If your struct is POD, the copy is optimized and will be as fast as memcpy (with proper optimization level).

  • If your struct is not POD, C++ has to call the copy constructor for your object. The copy constructor may call other functions, new operators, etc. so it will be slower than memcpy. But memcpy will not copy the struct correcty, using memcpy on a non-POD type results in undefined behaviour!

Note that e.g. in g++ the call to memcpy will be inlined and optimized out. As the intention between a struct copy and a memcpy call is exactly the same (copy X bytes from location Y to Z), I do not think that the generated assembly code will differ.

Anyway, to be sure, find it out by analyzing the assembly of your code.


Edit: just read the end of the question about the function parameters. Please note that function parameter passing is usually (especially in x64) done in registers and it is much faster than memcpy.

I've checked the assembly code and they do differ. The exact code will depend on the calling convention your current compiler uses. For me the struct is not passed in registers, rather it's passed on the stack and an actual copy is made. The three ints are passed in %ecx, %edx and %r8d. I've tried this on Windows GCC. It seems to use the Windows x64 calling convetion.

For more information on how the parameters are passed look at the specifications of your calling convention. All the details and corner cases are worked out. E.g. for x64 GCC look at System V AMD64 ABI Chapter 3.2.3 Parameter passing. For Visual Studio look here.

ankostis
  • 8,579
  • 3
  • 47
  • 61
Csq
  • 5,775
  • 6
  • 26
  • 39
  • Very interesting. This means that the struct version should be really slower in the function parameter case. Why does the compiler miss this easy chance for optimization? – gexicide Feb 15 '13 at 13:53
  • The compiler cannot really optimize here. It is written in stone (in the calling convention specifications) how to call a function, so the compiler just adheres to the specification. This allows two functions in different translation units to be linked together. (I'm talking about non-inline functions of course.) – Csq Feb 15 '13 at 13:57
  • Is it also written in stone that ints have to be passed in registers? What if I doesn't have that many registers? – gexicide Feb 15 '13 at 17:44
  • @gexicide That is a big stone with many rules :) Google "System V AMD64 ABI" and look for chapter 3.2.3 Parameter passing. I think the official link is down, but there are many mirrors. – Csq Feb 15 '13 at 22:42
0

See another answer by Alok Save for . In , it can be memcpy (or equivalent) or an inlined version of it (up to one mov instuction for structures with a good size).

Community
  • 1
  • 1
Anton Kovalenko
  • 20,999
  • 2
  • 37
  • 69