What comes first - stack unwinding or copying of return values

Question

Is the mutex used in method GetValues() released before or after copy constructing the dummy instance?

class Protect
{};

class Test
{
public:
    Protect GetValues() const;

private:
    Protect m_protVal;
    Mutex m_mutex;
};

Protect Test::GetValues() const
{
    CLockGuard lock(m_mutex);

    return m_protVal;
}

int main(int argc, char** argv)
{
    Test myTestInstance;

    Protect dummy = myTestInstance.GetValues();
}

Let's assume CLockGuard and Mutex are standard classes provided with boost lib.

http://stackoverflow.com/questions/3856729/how-to-use-lock-guard-when-returning-protected-data — NPE, Jan 19 '12 at 08:31
You could run your test program step by step in a debugger (like `gdb` on Linux) to find out. — Basile Starynkevitch, Jan 19 '12 at 08:31
@BasileStarynkevitch Which won't tell him anything about what is guaranteed by the language. — James Kanze, Jan 19 '12 at 08:36
@JamesKanze: It would be very surprising that any recent mainstream compiler gets this wrong. — PlasmaHH, Jan 19 '12 at 09:30
@PlasmaHH It's not a question of getting it wrong or not. It's a question of what the standard guarantees. In this case, the standard does guarantee a very specific order. But the fact that one particular compiler implements this order doesn't tell you whether the standard guarantees it, or whether it is implementation defined, or even unspecified. — James Kanze, Jan 19 '12 at 09:48

score 12 · Accepted Answer · edited Jan 19 '12 at 10:40

12

Yes:-). Formally, there are two “copies” when returning a value: one to some special place used to actually return the value, and the second after the return, to wherever the value must be finally placed. Either or both can be optimized out, however. The destruction of local variables occurs after the first, but before the second. (NRVO and RVO may lead to the first being optimized out, but they don't affect your code, since you're not returning a local variable.)

edited Jan 19 '12 at 10:40

Matthieu M.

287,565
48
449
722

answered Jan 19 '12 at 08:36

James Kanze

150,581
18
184
329

If I got your answer right the sequence goes like that: `copy m_protVal to temp`, `unlock`, `copy temp to dummy`. Where `temp` is located in your "special place". Correct? – nabulke Jan 19 '12 at 08:46
Yes. With the added proviso that this `temp` may be "merged" with a local variable or temporary, with `dummy`, or with both, and the corresponding copy eliminated. Since you're not returning a local variable or a temporary, this first optimization doesn't apply to your code. (It could be an issue if you construct a local variable which is returned before acquiring the lock.) – James Kanze Jan 19 '12 at 09:18
I suppose this means the copies can't be elided in cases where the destructor might throw, or might modify the final destination via an alias. In the case of the exception, the implementation must not modify the final destination before calling the destructor. In the case of the modification via alias, the implementation must not call the destructor after writing the return value, for fear it gets overwritten. Or am I wrong, and the copy elision rules permit those code transformations so that the copy to the final destination doesn't "logically" occur after destructors when NRVO is used? – Steve Jessop Jan 19 '12 at 10:34
http://stackoverflow.com/questions/8924034/can-copy-elision-happen-across-synchronize-with-statements seems to address the same concern as my comment above. – Steve Jessop Jan 19 '12 at 10:40
@SteveJessop The copies may be elided in all cases. This is a special rule, which allows the compiler to ignore changes in semantics (observable behavior) when optimizing. With regards to aliasing: and access to an alias to the final object is undefined behavior (because it is assumed not yet constructed). I suspect that this means that there is no threading risk, either, but I've not had the time to analyze all of the consequences (which, when threading is involved, are often far from trivial); my comment is just that there may be issues, and that such analysis is necessary. – James Kanze Jan 19 '12 at 11:10
@SteveJessop My worry concerns a slightly different case: `TserType nrvoValue; unique_lock lock( globalMutex ); return nrvoValue;`. With NRVO, the final target is being constructed before the lock has been acquired. I suspect that this is safe, since other threads shouldn't be accessing it unless there is a higher level lock, but I'd like to do (or see) a more detailed analysis of the issues to be sure. – James Kanze Jan 19 '12 at 11:14
@James: OK. As long as the standard says that when copies are elided, then the construction of the final target happens at the time that the construction of the elided temporary/local would happen. Then everything is defined, and as you say it's the programmer's fault if it's surprising. Btw, your comment above seems to say that it's UB to access the representation of an unconstructed object via, for example, an `unsigned char*` alias. Is that definitely the case? – Steve Jessop Jan 19 '12 at 11:31
@SteveJessop It's undefined behavior to access an object if another thread is modifying it. But you've got a point: if the other thread holds a lock on `globalMutex`, without NRVO, it can safely access the object via `unsigned char*`---with NRVO, it can't. (Why anyone would want to do this is another question---but it is an interesting situation where NRVO could introduce undefined behavior.) – James Kanze Jan 19 '12 at 16:11

score 2 · Answer 2 · answered Jan 19 '12 at 08:45

As of Visual C++, using MSVC 9.0, the following code

int add(int x, int y)
{
    int z;
    z = x + y;
    return z;
}

int _tmain(int argc, _TCHAR* argv[])
{
    add(10, 20);
    return 0;
}

results in assembly

int add(int x, int y)
{
013613A0  push        ebp  //save the frame pointer
013613A1  mov         ebp,esp 
013613A3  sub         esp,0CCh 
013613A9  push        ebx  
013613AA  push        esi  
013613AB  push        edi  
013613AC  lea         edi,[ebp-0CCh] 
013613B2  mov         ecx,33h 
013613B7  mov         eax,0CCCCCCCCh 
013613BC  rep stos    dword ptr es:[edi] 
    int z;
    z = x + y;
013613BE  mov         eax,dword ptr [x] //load x
013613C1  add         eax,dword ptr [y] //add y to x
013613C4  mov         dword ptr [z],eax //store the result to z
    return z;
013613C7  mov         eax,dword ptr [z] //store the return value in eax
}
013613CA  pop         edi  //unwind the stack
013613CB  pop         esi  
013613CC  pop         ebx  
013613CD  mov         esp,ebp 
013613CF  pop         ebp  
013613D0  ret

int _tmain(int argc, _TCHAR* argv[])
{    
013613E0  push        ebp  
013613E1  mov         ebp,esp 
013613E3  sub         esp,0C0h 
013613E9  push        ebx  
013613EA  push        esi  
013613EB  push        edi  
013613EC  lea         edi,[ebp-0C0h] 
013613F2  mov         ecx,30h 
013613F7  mov         eax,0CCCCCCCCh 
013613FC  rep stos    dword ptr es:[edi] 
    add(10, 20);
013613FE  push        14h  
01361400  push        0Ah  
01361402  call        add (136109Bh) 
01361407  add         esp,8 
    return 0;
0136140A  xor         eax,eax //store 0 to eax, the return value holder
}
0136140C  pop         edi  //unwind the stack
0136140D  pop         esi  
0136140E  pop         ebx

This makes me say that return value is stored first and then the stack unwinding happens!

score 2 · Answer 3 · answered Jan 19 '12 at 09:32

Here is a small complete test programm (in addition to James Kanzes good explanation), which will show if the unlock is done before or after stack unwinding:

#include <iostream>

class PseudoLockGuard
{
public:
    enum LockState { IsStillLocked, IsUnlocked};

    PseudoLockGuard(LockState& value) : m_value(value) {};
    ~PseudoLockGuard() { m_value = IsUnlocked; };

private:
    LockState& m_value;
};

PseudoLockGuard::LockState Test()
{
    PseudoLockGuard::LockState indicator = PseudoLockGuard::IsStillLocked;

    PseudoLockGuard lock(indicator);

    return indicator;// Will return IsStillLocked or IsUnlocked?
}

int main(int , char** )
{
   PseudoLockGuard::LockState result = Test();

   std::cout << (result == PseudoLockGuard::IsStillLocked 
                 ? "Return Value before Unlock" 
                 : "Return Value after Unlock"); 

   // Outputs "Return Value before Unlock"

   return 0;
}

score 2 · Answer 4 · answered Jan 19 '12 at 09:34

The standard is not particularly explicit on the matter as far as I can tell, but here's what I've managed to get together:

Automatic storage duration objects are destroyed as per 6.7 when the block they are declared in exits. - 3.7.2

On exit from a scope, destructors (12.4) are called for all automatic storage duration (3.7.2) (named objects and temporaries) that are declared in that scope, in reverse order of their declaration. - 6.6

A return statement with an expression of non-void type can be used only in functions returning a value; the value of the expression is returned to the caller of the function. The expression is implicitly converted to the return type of the function in which it appears. A return statement can involve the construction and copy of a temporary object (12.2). - 6.6.3

Even when the creation of the temporary object is avoided (12.6), all the semantic restrictions must be respected as if the temporary object was created. - 12.2

This seems to generally confirm what James said: on return m_protVal;, a temporary is created, and then destructors of all objects that must be destroyed are called in reverse order of their declaration (in this case, only the destructor of lock is called). However, I am not entirely sure how to interpret the following:

Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. - 12.2

A full-expression is defined as "as expression that is not a subexpression of another expression". I don't know what part of return m_protVal is the full expression: geordi says that this is a

decl'ion-seq → decl'ion → func-def → func-body → compound-stmt → stmt-seq → stmt → jump-stmt

while i by itself is a

decl'ion-seq → decl'ion → func-def → func-body → compound-stmt → stmt-seq → stmt → jump-stmt → … → id-expr → unqual-id → ident

This makes it unclear whether the temporary may be copied and destroyed prior to the rest of the destructors being called: I would say it may not, as return m_protVal; causes the end of the block to be reached, but I can't find anything in the standard that would confirm this.

(On the other hand, I don't see any case where such behaviour would cause any breakage; nobody should have a pointer to the temporary so it being destroyed first is not an issue, and if the temporary had a pointer to a local variable, the problems come in when the temporary is destroyed later -- not that writing code like that is a good idea in any case.)

What comes first - stack unwinding or copying of return values

4 Answers4

Linked

Related