2

I had a hard time debugging a crash on production. Just wanted to confirm with folks here about the semantics. We have a class like ...

class Test {
public:
  Test()
  {
    // members initialized ...
    m_str = m_str;
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

Someone changed the initialization to use ctor initialization-lists which is reasonably correct within our code semantics. The order of initialization and their initial value is correct among other things. So the class looks like ...

class Test {
public:
  Test() 
    : /*other inits ,,, */ m_str(m_str)
  {
  }
  ~Test() {}
private:
  // other members ...
  std::string m_str;
};

But the code suddenly started crashing! I isolated the long list of inits to this piece of code m_str(m_str). I confirmed this via link text.

Does it have to crash? What does the standard say about this? (Is it undefined behavior?)

Abhay
  • 7,092
  • 3
  • 36
  • 50
  • 4
    You are initializing m_str with itself? Why are you doing this? – Starkey Oct 08 '10 at 15:43
  • 1
    Undefined behaviour means that it could format your c disc and install another operating system. Nobody knows what particular "undefined behaviour" means on particular environment. – Sergey Teplyakov Oct 08 '10 at 15:44
  • someone else's code, i will change it, but need confirmation from the standard that it is indeed undefined behavior, becoz it does not crash for POD types like `int i; i(i)` in the initializer list – Abhay Oct 08 '10 at 15:46
  • Again, why would you do this? You need a param passed in to initialize that - or some other source. That is not good. – Tim Oct 08 '10 at 15:48
  • @Tim: Agreed. But the code's already there from ages... a change to initializer-list has brought out the bug. But to convince ppl higher up, it would be great if someone can quote the standard. – Abhay Oct 08 '10 at 15:50
  • 1
    @Abhay: The Standard describes what you can and can't do with uninitialised objects in 3.8/6. Specifically, "if an lvalue-to-rvalue conversion is applied to such an lvalue, the program has undefined behaviour". Such a conversion occurs when it is used as the initialiser argument. – Mike Seymour Oct 08 '10 at 16:01
  • 2
    @Abhay: That the people higher up need convincing that `: m_str(m_str) {}` is bad code is pretty frightening. – John Dibling Oct 08 '10 at 20:21
  • @John: Yes it is, most of them who control bouncing tasks on prod at short notice have not been programmers for long... BTW, this is the case with a very large financial firm and is a very common bureaucracy in large companies ... – Abhay Oct 08 '10 at 21:01

5 Answers5

15

The first constructor is equivalent to

  Test()
  : m_str()
  {
    // members initialized ...
    m_str = m_str;
  }

that is, by the time you get to the assignment within the constructor, m_str has already been implicitly initialized to an empty string. So the assignment to self, although completely meaningless and superfluous, causes no problems (since std::string::operator=(), as any well written assignment operator should, checks for self assignment and does nothing in this case).

However, in the second constructor, you are trying to initialize m_str with itself in the initializer list - at which point it is not yet initialized. So the result is undefined behaviour.

Update: For primitive types, this is still undefined behaviour (resulting in a field with garbage value), but it does not crash (usually - see the comments below for exceptions) because primitive types by definition have no constructors, destructors and contain no pointers to other objects.

Same is true for any type that does not contain pointer members with ownership semantics. std::string is hereby demonstrated to be not one of these :-)

Péter Török
  • 114,404
  • 31
  • 268
  • 329
  • 2
    Depends what you mean by "primitive types". Even assignment of indeterminate value to `int` can in principle cause a crash, because the standard allows padding and trap representation for `int`. You have to get down to the `char` types to have portable code with well-defined initialization-from-self. – Cheers and hth. - Alf Oct 08 '10 at 16:20
  • @Alf even for `char` types, such is undefined behavior. Only if you dereference a `char` types pointer it's defined for any and all possible patterns (because it then reads from memory and because `char` types have no traps). See the last note at http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#240 , *"The CWG did not consider that access to an unsigned char might still trap if it is allocated in a register and needs to reevaluate the proposed resolution in that light."*. – Johannes Schaub - litb Oct 08 '10 at 19:58
  • @Johannes: thanks, I didn't think of that! (But neither did the committee...) However, I'm sure that it's about the last gasp of "ENIAC-compatibility", and that there's no in-practice issue -- at least, I sincerely hope so! :-) – Cheers and hth. - Alf Oct 08 '10 at 20:28
  • 1
    @Alf there is a good article about the ia64's NaT the committee talks about: http://blogs.msdn.com/b/oldnewthing/archive/2004/01/19/60162.aspx – Johannes Schaub - litb Oct 08 '10 at 20:58
  • @Alf, @Johannes, thanks for the clarification, I learned something new again :-) – Péter Török Oct 10 '10 at 16:14
2

m_str is constructed in the initialization list. Therefore, at the time you are assigning it to itself, it is not fully constructed. Hence, undefined behavior.

(What is that self-assignment supposed to do anyway?)

Nate
  • 18,752
  • 8
  • 48
  • 54
  • I donno, someone else's code. I need confirmation if it's indeed undefined behavior. Becoz it definitely does not crash for POD types. – Abhay Oct 08 '10 at 15:44
  • @Tim: Agreed. But its live code, being hit a million times per day. I need approval from ppl higher up who say 'its been working for ages ...' So i need a convincing reference. – Abhay Oct 08 '10 at 15:53
  • I don’t have page and chapter, but you can assume there’s a pointer somewhere inside the uninitialized `string` that is pointing to a random location in memory. And you’re copying that! Bad bad bad. – Nate Oct 08 '10 at 16:01
2

The original "initialization" by assignment is completely superfluous.

It didn't do any harm, other than wasting processor cycles, because at the time of the assignment the m_str member had already been initialized, by default.

In the second code snippet the default initialization is overridden to use the as-yet-uninitialized member to initialize itself. That's Undefined Behavior. And it's completely unnecessary: just remove that (and don't re-introduce the original time-waster, just, remove).

By turning up the warning level of your compiler you may be able to get warnings about this and similar trivially ungood code.

Unfortunately the problem you're having is not this technical one, it's much more fundamental. It's like a worker in a car factory poses a question about the square wheels they're putting on the new car brand. Then the problem isn't that the square wheels don't work, it's that a whole lot of engineers and managers have been involved in the decision to use the fancy looking square wheels and none of them objected -- some of them undoubtedly didn't understand that square wheels don't work, but most of them, I suspect, were simply afraid to say what that they were 100% sure of. So it's most probably a management problem. I'm sorry, but I don't know a fix for that...

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • True to some extent in my case as well. I thought I can convince them using the C++ Standard as reference. So awaiting the quote.. I do not have a copy of that / Internet for n3092.pdf either – Abhay Oct 08 '10 at 15:56
  • You can download the latest draft of the standard from the C++ committee pages, at [http://www.open-std.org/jtc1/sc22/wg21/]. – Cheers and hth. - Alf Oct 08 '10 at 16:22
1

Undefined behavior doesn't have to lead to a crash -- it can do just about anything, from continuing to work as if there was no problem at all, to crashing immediately, to doing something really strange that causes seemingly unrelated problems later. The canonical claim is that it makes "demons fly out of your nose" (aka, "causes nasal demons"). At one time the inventor of the phase had a (pretty cool) web site telling about the nuclear war that started from somebody causing undefined behavior in the "DeathStation 9000".

Edit: The exact wording from the standard is (§:1.3.12):

1.3.12 undefined behavior [defns.undefined]

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

This is the same difference as between

std::string str;
str = str;

and

std::string str(str);

The former works (although it's nonsense), the latter doesn't, since it tries to copy-construct an object from a not-yet-constructed object.

Of course, the way to go would be

Test() : m_str() {}
sbi
  • 219,715
  • 46
  • 258
  • 445