11

eg, it puzzles me:

struct A {
//  some fileds...
    char buf[SIZE];
};

A a;
a = a;

Through A's field buf, it looks like probably that the default assign operation will call something like memcpy to assign an object X to Y, so what if assign an object to itself and there are no explicit assign operation defined, like a = a; above.

memcpy manual page:

DESCRIPTION

The  memcpy() function copies n bytes from memory area src to memory area dest.  The memory areas must not overlap.  Use memmove(3) if the memory areas do overlap.

If use memcpy, there may some undefined behavior occur.

So, what's the default assign operation behavior in C++ object?

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
superK
  • 3,932
  • 6
  • 30
  • 54
  • Why would memcpy cause undefined behaviour here? Two distinct array objects never overlap (well, there is *one* exception, but it isn't relevant here). – R. Martinho Fernandes Aug 06 '13 at 12:10
  • @R.MartinhoFernandes Because we call `a=a` here, and if the behavior of default assignment operation is memcpy, then we violation `The memory areas must not overlap` – superK Aug 06 '13 at 12:12
  • 1
    The problem with overlap is if you are copying from a region that gets overwritten by another part of the region - that is, you are inserting a character into a string, and do `memcpy(&a[index+1], &a[index], len-index);`. But `memcpy(a, a, sizeof(a))` would be fine. – Mats Petersson Aug 06 '13 at 12:13
  • 1
    @R.MartinhoFernandes Would you be so kind and enlighten me what that exception is? – nikolas Aug 06 '13 at 12:13
  • @KaiWen Oh. How could I miss that. – R. Martinho Fernandes Aug 06 '13 at 12:13
  • 3
    @nijansen It's string literals. An example of a pair of literals that could be stored overlapping is `"bar"` and `"foobar"`. (It isn't a problem for this scenario because those arrays are `const`.) – R. Martinho Fernandes Aug 06 '13 at 12:17
  • 3
    @MatsPetersson memcpy(a, a, sizeof(a)) is not fine, src and dst (wholly) overlaps. – nos Aug 06 '13 at 12:17
  • 3
    Ok, let me rephrase that: It may be undefined, but the danger with overlapping regions isn't when the overlap is COMPLETE, that is, you are copying things from location X to location X, but when there is an offset between the source and destination, where the source, source+length and destination, destination+length overlap, because the writes of the destination will at some point overwrite the source. If you overwrite directly, it is much less likely to cause a problem. Of course, it's also a complete waste of time to copy 200 bytes over itself. – Mats Petersson Aug 06 '13 at 12:25
  • 1
    @KaiWen But since assignment isn't `memcpy`, constraints on `memcpy` are irrelevant. – James Kanze Aug 06 '13 at 12:26
  • @JamesKanze: Well, it is and it isn't. The code generated (by g++ at least) is exactly identical for the struct described above [minus the ... part that doesn't compile] in the case of `memcpy(&a, &b, sizeof(a));` and `a = b;` - both turn into a `rep movsq` and relevant setup to set size, source and destination. – Mats Petersson Aug 06 '13 at 12:29
  • actually, since the compiler manufacturer is also the manufacturer of memcpy, the implementation behavior is not be so undefined for `memcpy(a,a,sizeof(a))` (but only for him as he's in control of the implemenation). As long as he can ensure that calling this satisfies the required behavior for the default constructor, he can emit this code. Still, if you are using his library and you are not the manufacturer, you should not write this code since it is from your point of view undefined and the behavior may change anytime without notification. – Tobias Langner Aug 06 '13 at 12:35
  • 1
    @MatsPetersson That different source code may result in the same machine instructions isn't too surprising. On my system, the assignment actually does call `memcpy`. But that's all rather irrelevant: assignment isn't `memcpy`, ever, and if the compiler calls `memcpy`, it's an optimization under the as if rule, and because it knows what the actual undefined behavior will be in this case. (In other words, R. Marinho Fernandes' answer got everything right, and said everything which could possibly be relevant.) – James Kanze Aug 06 '13 at 12:37

4 Answers4

14

The assignment operator is not defined in terms of memcpy (§12.8/28).

The implicitly-defined copy/move assignment operator for a non-union class X performs memberwise copy/move assignment of its subobjects. The direct base classes of X are assigned first, in the order of their declaration in the base-specifier-list, and then the immediate non-static data members of X are assigned, in the order in which they were declared in the class definition. Let x be either the parameter of the function or, for the move operator, an xvalue referring to the parameter. Each subobject is assigned in the manner appropriate to its type:

[...]

— if the subobject is an array, each element is assigned, in the manner appropriate to the element type;

[...]

As you see, each char element will be assigned individually. That is always safe.

However, under the as-if rule, a compiler may replace this with a memmove because it has identical behaviour for a char array. It could also replace it with a memcpy if it can guarantee that memcpy will result in this same behaviour, even if theoretically such a thing is undefined. Compilers can rely on theoretically undefined behaviour; one of the reasons undefined behaviour exists is so that compilers can define it to whatever is more appropriate for their operation.

Actually, in this case a compiler could take the as-if rule even further and not do anything with the array at all, since that also results in the same behaviour.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
4

Default assign (and copy) behaviour does not memcpy the whole class, which would break things. Each member is copied using their copy constructor or assignment operator (depending on operation). This is applied recursively for members and their members. When a basic data type is reached, it simply performs a straight copy of data, similar to memcpy. So an array of basic data types may be copied similar to memcpy, but the whole class is not. If you add std::string to your class its = operator would be called, alongside copy of array. If you used array of std::string, each string in your array will have their operator called. They won't memcpy.

Neil Kirk
  • 21,327
  • 9
  • 53
  • 91
  • 2
    That answer ist partially correct, but does not include the specific case the OP asks about. – Arne Mertz Aug 06 '13 at 12:16
  • As memcpy is not used, the violation of memcpy rules is not relevent any more. But I should have discussed self-assignment. – Neil Kirk Aug 06 '13 at 12:17
  • Actually I am not sure how default operator handles self-assignment. My hunch is that it doesn't explicitly, and depends upon members that must prevent such a thing, to do their own check in their custom assignment operator. – Neil Kirk Aug 06 '13 at 12:20
  • 1
    @NeilKirk Why should the compiler do anything special with regards to self assignment? (For that matter, why should a user defined assignment operator worry about self assignment?) – James Kanze Aug 06 '13 at 12:28
  • @JamesKanze You have a high rep so I wonder if this is this a serious question. You can search web/SO for self-assignment issues and when it is necessary to "worry". std::vector in my compiler detects for self-assignment in its assignment operator, for example. – Neil Kirk Aug 06 '13 at 13:31
  • @NeilKirk it's not needed except as an optimisation (and a dubious one at that, since that's not really a common case) – R. Martinho Fernandes Aug 06 '13 at 14:22
  • 1
    @NeilKirk You can find a lot of wrong information if you search the web. In general, if you need a check for self assignment, your assignment operator is broken. (And a correctly written implementation of `std::vector<>::operator=` doesn't need to check for self assignment. Although it might for reasons of optimization; vector assignment is *very* expensive.) – James Kanze Aug 06 '13 at 14:22
  • @R.MartinhoFernandes In the case of `std::vector`, it could be justified, because the test is cheap, and the actual assignment is very, very expensive. – James Kanze Aug 06 '13 at 14:22
  • @JamesKanze It's hard to decide what information is "wrong" or not, especially if you don't provide any reason for your comments. Could you explain instead of just saying I'm wrong? – Neil Kirk Aug 06 '13 at 14:55
  • @NeilKirk Comments don't provide enough space to really explain, but why would you need a test for self assignment except for optimization? The sort-of standard idiom is the swap idiom: `MyClass& MyClass::operator=( MyClass const& other ) { MyClass tmp( other ); swap( other ); return *this; }` More generally, although you don't have to construct a complete object, you do have to do any operations which may fail _before_ modifying the target object. And once you've done everything which can fail, you can destruct anything which needs it, and overwrite, without worrying. – James Kanze Aug 06 '13 at 15:07
  • @NeilKirk You might also want to search for exception safety assignment operator. Although when I tried it, the first hits were just saying use the swap idiom, without any real explination of why, or what the alternatives are. – James Kanze Aug 06 '13 at 15:33
0

Some limited experimentation tells me that g++ completely removes any attempt to copy a = a; [assuming it is obvious - I'm sure with sufficient messing about with pointers, it will eventually be possible to copy the same object over itself, and get undefined behaviour].

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 2
    It won't get UB, it is defined in terms of element-wise copy (§12.8,28), and element-wise self-assignment is well-defined. – Arne Mertz Aug 06 '13 at 12:24
-1

If use memcpy, there may some undefined behavior occur.

It's an implementation detail how the given class will be copied. Both memcpy() function and copy constructor will be converted into some machine code. However your objects in memory should not overlap because default assignment does not guarantee you'll have a proper result in case they overlap.

So, what's the default assign operation behavior in C++ object?

As in other responses, the behaviour is such that it will call assignments on all class/struct members recursively. However technically, as in your case, it may just copy whole block of memory, especially if your structure is POD (plain old data).

Karadur
  • 1,226
  • 9
  • 16
  • This is wrong. The implicit copy constructors work fine with self-assignment. (And that's irrelevant, and the example is about assignment not initialisation) – R. Martinho Fernandes Aug 06 '13 at 12:29
  • What is wrong exactly? I was referring to overlapping objects, not self-assignment. With overlapping objects, the behaviour is undefined. – Karadur Aug 06 '13 at 12:33
  • Of the copy constructor? No, it isn't. (How do you call the copy constructor with overlapping objects?) – R. Martinho Fernandes Aug 06 '13 at 12:37
  • Right, I incorrectly mentioned copy constructor, that had to be 'default assignment'. Will correct that, thanks. Although copy constructor may also behave incorrectly when say you use a placemnent new consecutively on overlapping blocks of memory. – Karadur Aug 06 '13 at 13:09