0

While JavaScript doesn't directly become assembly, it should demonstrate the general question which is how a high-level function would look if it were implemented in assembly if the function's inputs are large. Say for example this case:

myfunc(1, 2, 3)

The variables there are small integers so they could be placed on individual registers. But say you have:

var a = 'some markdown readme...'
myfunc('my really long string', a, 'etc.')

Wondering how that would be done in assembly (at a high level).

It doesn't seem that the assembly call stack would be used to store these values, because they are large. Maybe it stores the memory address and the offset of it (but if it's dynamic...). Am interested to know how this works.

Lance
  • 75,200
  • 93
  • 289
  • 503
  • Strings require a variable amount of storage, so they are always stored in a heap. The processor uses the address of the allocated string buffer. A pointer, like int, one of the primary data types directly supported by a processor. – Hans Passant May 02 '18 at 19:00
  • Nice, yeah I understand that part but I'm wondering how it works a little more deeper level. – Lance May 02 '18 at 19:13
  • 1
    strings are a pointer no matter how long thats how it works... – old_timer May 02 '18 at 19:23
  • @HansPassant Strings can be stored in stack and often are (for example a local char[] variable in C, or small string optimization in C++ std::string implementations). Also they can be in static data. – hyde May 02 '18 at 19:59
  • No, not in a jitted or interpreted language. – Hans Passant May 02 '18 at 20:00
  • 2
    @old_timer Not always pointer, for example in C a short char[] string in a small enougb struct may be passed in registers, no pointer needed. – hyde May 02 '18 at 20:04
  • 1
    @HansPassant Why would a JIT compiler put a short local string used in one place to heap? Maybe all current ones do, but I don't see why they have to. – hyde May 02 '18 at 20:07
  • Simply because there is no programmer around that can guarantee the function is only ever called with a small enough string. – Hans Passant May 02 '18 at 20:12
  • You are welcome to come up with as many exceptions as you like to my specific language, I apologize for trying to be generic. See the answer below to determine just how trivial it is to answer this question on your own for your favorite target and string length assuming you have a toolchain in which you can either examine the assembly output of the compiler or can disassemble later. In general a string variable is an address and in general that address is passed on to the callee, some optimizations might shorten that so long as the callee also knows what is going on. – old_timer May 02 '18 at 20:13
  • inside a struct an address is an address, and your comment while probably true simply reinforces you should never ever use structs across compile domains, which sadly has become a fad, a dangerous fad... – old_timer May 02 '18 at 20:14

2 Answers2

1

Arrays (including strings) are passed by reference in most high level languages. int foo(char*) just gets a pointer value as an arg, and a pointer typically one machine word (i.e. fits in a register). In good modern calling conventions, the first few integer/pointer args are typically passed in registers.

In C/C++, you can't pass a bare array by value. Given int arr[16]; func(arr);, the function func only gets a pointer (to the first element).

In some other higher level languages, arrays might be more like C++ std::vector so the callee might be able to grow/shrink the array and find out its length without a separate arg. That would typically mean there's a "control block".

In C and C++ you can pass structs by value, and then it's up to the calling-convention rules to specify how to pass them.

x86-64 System V for example passes structs of 16-byte or less packed into up to 2 integer registers. Larger structs are copied onto the stack, regardless of how large an array member they contain (What kind of C11 data type is an array according to the AMD64 ABI). (So don't pass giant objects by value to non-inline functions!)

The Windows x64 calling convention passes large structs by hidden reference.

Example:

typedef struct {
    // too big makes the asm output cluttered with loops or memcpy
    // int Big_McLargeHuge[1024*1024];
    int arr[4];
    long long a,b; //,c,d;
} bigobj;
// total 32 bytes with int=4, long long=8 bytes

int func(bigobj a);
int foo(bigobj a) {
    a.arr[3]++;
    return func(a);
}

source + asm output on the Godbolt compiler explorer.

You can try other architectures on Godbolt with their standard calling conventions, like ARM or AArch64. I picked x86-64 because I happened to know of an interesting difference in the two major calling conventions on that one platform for struct-passing.

x86-64 System V (gcc7.3 -O3): foo has a real by-value copy of its arg (done by its caller) that it can modify, so it does so and uses it as the arg for the tail-call. (If it can't tailcall, it would have to make yet another full copy. This example artificially makes System V look really good).

foo(bigobj):
    add     DWORD PTR [rsp+20], 1   # increment the struct member in the arg on the stack
    jmp     func(bigobj)            # tailcall func(a)

x86-64 Windows (MSVC CL19 /Ox): note that we address a.arr[3] via RCX, the first integer/pointer arg. So there is a hidden reference, but it's not a const-reference. This function was called by value, but it's modifying the data it got by reference. So the caller has to make a copy, or at least assume that a callee destroyed the arg it got a pointer to. (No copy required if the object is dead after that, but that's only possible for local struct objects, not for passing a pointer to a global or something).

$T1 = 32    ; offset of the tmp copy in this function's stack frame
foo PROC
    sub      rsp, 72              ; 00000048H     ; 32B of shadow space + 32B bigobj + 8 to align
    inc      DWORD PTR [rcx+12]
    movups   xmm0, XMMWORD PTR [rcx]              ; load modified `a`
    movups   xmm1, XMMWORD PTR [rcx+16]           ; apparently alignment wasn't required
    lea      rcx, QWORD PTR $T1[rsp]
    movaps   XMMWORD PTR $T1[rsp], xmm0
    movaps   XMMWORD PTR $T1[rsp+16], xmm1         ; store a copy
    call     int __cdecl func(struct bigobj)
    add      rsp, 72              ; 00000048H
    ret      0
foo ENDP

Making another copy of the object appears to be a missed optimization. I think this would be valid implementation of foo for the same calling convention:

foo:
    add      DWORD PTR [rcx+12], 1       ; more efficient than INC because of the memory dst, on Intel CPUs
    jmp      func                        ; tailcall with pointer still in RCX

x86-64 clang for the SysV ABI also misses the optimization that gcc7.3 found, and does copy like MSVC.

So the ABI difference is less interesting than I thought; in both cases the callee "owns" the arg, even though for Windows it's not guaranteed to be on the stack. I guess this enables dynamic allocation for passing very large objects by value without a stack overflow, but that's kind of pointless. Just don't do it in the first place.


Small objects:

x86-64 System V passes small objects packed into registers. Clang finds a neat optimization if you comment out the long long members so you just have

typedef struct {
    int arr[4];
    //    long long a,b; //,c,d;
} bigobj;

# clang6.0 -O3
foo(bigobj):                          # @foo(bigobj)
    movabs  rax, 4294967296    # 0x100000000 = 1ULL << 32
    add     rsi, rax
    jmp     func(bigobj)          # TAILCALL

(arr[0..1] is packed into RDI, and arr[2..3] is packed into RSI, the first 2 integer/pointer arg-passing registers in the x86-64 SysV ABI).

gcc unpacks arr[3] into a register by itself where it can increment it.

But clang, instead of unpacking and repacking, increments the high 32 bits of RSI by adding 1ULL<<32.

MSVC still passes by hidden reference, and still copies the whole object.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

Why not just try it?

   const char str[]="some string, doesnt matter how long";
    void more_fun ( const char *, const char *, int);

    void fun ( void )
    {
        more_fun(str,"hello world",5);
    }

a dummy function to make the linker happy

.globl more_fun
more_fun:
    bx lr

architecture is not really relevant to this question, compilers solve this specific issue the same way with general purpose instruction sets that have a basic set of addressing modes, etc...so if there is an exception to that I am not talking about those platforms, but x86, arm, mips, powerpc, etc, etc, etc, will fall into this category.

Link and disassemble and you see what was already know since a string variable by definition is a pointer to the beginning of something (just an address nothing more exciting):

Disassembly of section .text:

00001000 <fun>:
    1000:   e92d4010    push    {r4, lr}
    1004:   e3a02005    mov r2, #5
    1008:   e59f100c    ldr r1, [pc, #12]   ; 101c <fun+0x1c>
    100c:   e59f000c    ldr r0, [pc, #12]   ; 1020 <fun+0x20>
    1010:   eb000003    bl  1024 <more_fun>
    1014:   e8bd4010    pop {r4, lr}
    1018:   e12fff1e    bx  lr
    101c:   0000104c    andeq   r1, r0, r12, asr #32
    1020:   00001028    andeq   r1, r0, r8, lsr #32

00001024 <more_fun>:
    1024:   e12fff1e    bx  lr

Disassembly of section .rodata:

00001028 <str>:
    1028:   656d6f73    strbvs  r6, [sp, #-3955]!   ; 0xfffff08d
    102c:   72747320    rsbsvc  r7, r4, #32, 6  ; 0x80000000
    1030:   2c676e69    stclcs  14, cr6, [r7], #-420    ; 0xfffffe5c
    1034:   656f6420    strbvs  r6, [pc, #-1056]!   ; c1c <fun-0x3e4>
    1038:   20746e73    rsbscs  r6, r4, r3, ror lr
    103c:   7474616d    ldrbtvc r6, [r4], #-365 ; 0xfffffe93
    1040:   68207265    stmdavs r0!, {r0, r2, r5, r6, r9, r12, sp, lr}
    1044:   6c20776f    stcvs   7, cr7, [r0], #-444 ; 0xfffffe44
    1048:   00676e6f    rsbeq   r6, r7, pc, ror #28
    104c:   6c6c6568    cfstr64vs   mvdx6, [r12], #-416 ; 0xfffffe60
    1050:   6f77206f    svcvs   0x0077206f
    1054:   00646c72    rsbeq   r6, r4, r2, ror r12

because this was from objdump it simply tried to disassemble the string as if it were instructions so igore the disassembly for the text portions.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • the storage of the immediate string becomes read only/text. if you had runtime initialized string that was a local variable the string itself would of course be on the stack (if not static) but still a pointer is a pointer and a string variable is just a pointer so that would be passed to the callee using the pointer not the entire string. structures, etc, all work this way. sometimes if the language defines there will be a copy made on the stack before calling but the call uses the pointer, an address. – old_timer May 02 '18 at 19:33
  • My apologies if i am being too general. The actual point here is how trivial it is to answer the question yourself by simply trying it...for situation X what do the tools produce. – old_timer May 02 '18 at 20:17
  • Also for x86, arm, etc there is a calling standard/definition that the compiler conforms to, and that document will describe these situations, and what the rule is for how to pass what. – old_timer May 02 '18 at 20:59