Why can compiler assume that the address of a global variable will fit 32bit?

Question

When looking at the assembler (see on godbolt.org) of this simple function

extern int global;
void doit(int *);
void call_doit(){
    doit(&global);
}

a 32bit value is used to hold the address of global:

call_doit:
        movl    $global, %edi
        jmp     doit

I understand, that using 32bit-registers (i.e. %edi) here is superior to 64bit-registers (i.e. %rdi), because a 2 bytes can be saved (movl $global, %edi needs 5 bytes while movq $global, %rdi needs 7 bytes + 4 additional bytes if one would not assume that $global fits in 32 bits).

(editor's note: compilers actually use 7-byte lea global(%rip), %rdi to create a 64-bit address from RIP + 32-bit relative displacement, which compilers can assume is in range for related reasons. And movabs $global, %rdi would be 10 bytes, not 11, for 64-bit absolute addresses.)

But why is the compiler allowed to assume that the address of the global variable will fit these 32 bits? What guarantees does the compiler have?

For a local variable, the compiler uses a 64-bit register to hold a stack address, e.g.:

void doit(int *);
void call_doit(){
    int local=0;
    doit(&local);
}

results in (see on godbolt.org):

call_doit:
        subq    $24, %rsp
        leaq    12(%rsp), %rdi
        movl    $0, 12(%rsp)
        call    doit
        addq    $24, %rsp
        ret

Interesting. C requires the compiler to be able to handle at least 4095 distinct identifiers in a translation unit, but says nothing about linking the entire program. Is there a limit to the number of global identifiers that the GNU linker imposes, and relies upon? — Toby Speight, Sep 10 '18 at 08:22
@mhc this is because tail call optimization is only possible in the first case (in the second case you have to adjust %rsp) — ead, Sep 10 '18 at 08:29
BTW, if you do ever create a program with more than 4GB of global variables, there's a non-zero probability that you're Doing It Wrong. ;-) — Toby Speight, Sep 10 '18 at 08:32
even Clang and ICC generates a 32-bit move for the global case. If you change to C++ you'll have more compilers to test, and among those Zapcc and ellcc also emit `movl $global, %edi` — phuclv, Sep 10 '18 at 09:50

Why can compiler assume that the address of a global variable will fit 32bit?

0 Answers0

Linked