But how is the first assembly code retrieving and storing the struct returned from the subroutine?
First of all, it doesn't return a struct, it returns a pointer to a struct in EAX. The function's return type is struct _A*
. You don't show what it's pointing to; perhaps some static buffer in a non-thread-safe function?
It looks like you left out a rep movsd
in the first example after setting up esi, edi, and ecx (your esx
is obviously a typo). This will memcpy 4*8
= 32 bytes from the pointer returned in EAX to the static storage for A
. (Note the mov edi, offset A
to get the actual address of A
into EDI.)
With a smaller struct, it copies it with a few mov
instructions instead of setting up for a rep movsd
(which has considerable startup overhead and is a bad choice for a 32-byte copy if SSE was available). i.e. it fully unrolls a copy loop.
(In the first version of the, I didn't look closely enough at the code, and based on the wording thought you were actually returning a struct by value when you talked about returning a struct. Seems a shame to delete what I wrote about that related case. Instead of hidden pointer, you have an explicit pointer to an object that exists in the C++, not just in the asm implementation of what the C++ abstract machine does.)
Large struct by-value returns are returned by hidden pointer (the caller passes in a pointer as the first arg, and the function returns it in EAX for the convenience of the caller). This is typical for most calling conventions; see links to calling convention docs in the x86 tag wiki.
The value of A
itself is 32 bytes, and doesn't fit in a register. Often in asm you need a pointer to an object. push OFFSET A
is probably part of calling a function that takes A by reference (probably explicitly in the C++ source; I don't think any of the standard x86 calling conventions implement pass-by-value as pass-by-const-reference, only by non-const reference e.g. for Windows x64, and maybe others).
Your compiler probably couldn't optimize A = foo();
(returning a large struct by value) by passing the address of A
directly as the output pointer.
A
is global, and the callee is allowed to assume that its return-value buffer doesn't alias the global A
. The caller can't assume that the function doesn't access A
directly, but according to the C++ abstract machine the value of A
doesn't change until after the function returns.