2

I'm trying to create an assembly program with GAS syntax that can access it's variables from .data section in a position independent way on x86-64 arch with enforcing 32bit arch and IS (%eip instead of %rip).

No matter what registers I tried, the best result I got was a Segmentation fault: 11 and even that is for accessing the EIP which I shouldn't be able to do at all, therefore the SF. The best result because that at least told me something other than "meh, it won't do".

I'm compiling the file with gcc on macOS 10.13.6 mid 2010 Intel Core 2 Duo (that's why clang probably):

$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

and passing some options to the linker with this:

gcc -m32 -Wl,-fatal_warnings,-arch_errors_fatal,-warn_commons,-pie test.s

ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not allowed in code signed PIE, but used in _main from /whatever.../test-a07cf9.o. To fix this warning, don't compile with -mdynamic-no-pic or link with -Wl,-no_pie ld: fatal warning(s) induced error (-fatal_warnings) clang: error: linker command failed with exit code 1 (use -v to see invocation) 1


test.s

.text
.global _main

_main:
    xor %eax, %eax
    xor %ebx, %ebx

    # lea var1(%esi/edi/ebp/esp), %ebx  # can't compile, not PIE
    # lea var1(%eip), %ebx  # segfault, obvs

    # lea (%esp), %ebx      # EBX = 17
    # lea (%non-esp), %ebx  # segfault

    # lea 0(%esi), %ebx     # segfault 
    # lea 0(%edi), %ebx     # segfault
    # lea 0(%ebp), %ebx     # EBX = 0
    # lea 0(%esp), %ebx     # EBX = 17
    # lea 0(%eip), %ebx     # segfault, obvs

    movl (%ebx), %eax
    ret

.data
    var1: .long 6

.end

I'm running it with ./a.out; echo $? to check the EAX value from ret at the end.

I looked at various sources, but mostly it's Intel syntax or one of these questions - 1, 2, 3. I tried to disassemble the simplest C example I could come up with i.e. a global variable + return from main() - gcc -S test.c -fPIE -pie -fpie -m32:

int var1 = 6;
int main() { return var1; }

which basically resulted in:

    .section    __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 13
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:
    pushl   %ebp
Lcfi0:
    .cfi_def_cfa_offset 8
Lcfi1:
    .cfi_offset %ebp, -8
    movl    %esp, %ebp
Lcfi2:
    .cfi_def_cfa_register %ebp
    pushl   %eax
    calll   L0$pb
L0$pb:
    popl    %eax
    movl    $0, -4(%ebp)
    movl    _var1-L0$pb(%eax), %eax
    addl    $4, %esp
    popl    %ebp
    retl
    .cfi_endproc
                                        ## -- End function
    .section    __DATA,__data
    .globl  _var1                   ## @var1
    .p2align    2
_var1:
    .long   6                       ## 0x6


.subsections_via_symbols

This obviously uses MOV as LEA and almost the same instruction as mine except -L0$pb part that should be +/- like address of _var1 - address of L0$pb to get into the .data section.

And yet when I try the same approach with var1 and _main labels, nothing:

.text
.global _main

_main:
    xor %eax, %eax
    xor %ebx, %ebx

    #movl var1-_main(%ebp), %eax  # EAX = 191
    #movl var1-_main(%esp), %eax  # EAX = 204
    #movl var1-_main(%eax), %eax  # segfault
    ret

.data
    var1: .long 6

.end

Any ideas what am I doing wrong?

Edit:

I managed to cut out any unnecessary stuff from the disassembled C example and ended up with this:

.text
.global _main

_main:
    pushl %ebp
    pushl %eax
    calll test

test:
    popl %eax

    /* var1, var2, ... */
    movl var1-test(%eax), %eax

    addl $4, %esp
    popl %ebp
    retl

/**
 * how var1(label) - test(label) skips this label
 * if it's about address subtracting?
 */
blobbbb:
    xor %edx, %edx

.data
var1: .long 6
var2: .long 135

And it kind of doesn't make sense that much to me because according to this guide the caller should 1) push the parameters onto stack (none) 2) call the label and the callee should actually play with the ESP, EBP and other registers. Also, why do I even need an intermediate label or better said, is there any way without it?

Peter Badida
  • 11,310
  • 10
  • 44
  • 90
  • Have you tried inspecting the output from a C conpiler and doing that? – fuz Sep 15 '18 at 13:33
  • @fuz yup, I can reproduce it with a rather small example with push ebp, push eax, call intermediate label, pop eax, use mov or lea in that label, pop ebp, ret. but I think there has to be a better way. I'll edit the question when I get to PC. – Peter Badida Sep 15 '18 at 13:36
  • @fuz I edited the question with a simple gas asm example that does the PIE variable access, but it seems to me rather weird and non-compliant with the caller/callee guides I found on the Internet. – Peter Badida Sep 15 '18 at 14:30
  • The function call in clang's output is a fake function call to get the content of the instruction pointer on the stack. This is immediately popped in the next instruction to get the actual address of `test` into `eax`. Would you be satisfied with an answer that explained what actually goes on there? And can you please show me the actual code generated by the compiler, not an edited version (as I think there is something missing in it). – fuz Sep 15 '18 at 14:57
  • @fuz Edited again. Yeah, pretty much that would do. Also if you have time, please explain the comment about the left out label in the last sample or paste a link and I'll manage from that. :) I'm still not sure whether there just isn't other way for PIE and variables than the fake call, hmmm.. – Peter Badida Sep 15 '18 at 15:07

1 Answers1

5

In 32 bit modes, there is no eip relative addressing mode as in 64 bit mode. Thus, code like

mov var(%eip), %eax

is not actually legal and doesn't assemble in 32-bit mode. (In 64-bit it will truncate the address to 32 bits). In traditional non-PIE 32-bit binaries, you would just do

mov var, %eax

which moves the value at the absolute address of var to eax, but that's not possible in PIE binaries as the absolute address of var is unknown at link time.

What the linker does know is the layout of the binary and what the distance between labels is. Thus, to access a global variable, you proceed like this:

  1. find out the absolute address of some label and load some register with it
  2. add to that the distance from that label to var
  3. access the variable

Steps 2 and 3 can be combined using an addressing mode with displacement. Step 1 is tricky. There is only one useful instruction that tells us what the address of a location whose address we don't know is and that's call: the call instruction pushes the address of the next instruction on the stack and then jumps to the indicated address. If we tell call to just jump to the next address, we reduce its functionality to what is essentially push %eip:

        call Label                  # like push %eip
Label:  ...

Note that this use case is special-cased in the CPU's return prediction to not actually count as a function call. Since this is not a real function call, we don't establish a stack frame or similar and we don't have a return for this call. It's just a mechanism to get the value of the instruction pointer.

So from this, we know the address of Label. Next we can pop it off the stack and use it to find the address of var:

        call Label
Label:  pop %eax                    # eax = Label
        add $var-Label, %eax        # eax = Label + var - Label = var

and then we can dereference this to get the content of var:

        call Label
Label:  pop %eax
        add %eax, $var-Label
        mov (%eax), %eax            # eax = *var

In real code, you would merge the addition and the memory operand to save an instruction:

        call Label
Label:  pop %eax
        mov var-Label(%eax), %eax   # eax = *var

If you want to refer to multiple static variables in one function, you only need to use this trick once. Just use suitable differences:

        call Label
Label:  pop %eax
        mov foo-Label(%eax), %ebx   # ebx = *foo
        mov bar-Label(%eax), %ecx   # ecx = *bar

Note that gcc favours a variant of this idiom to get the content of the instruction pointer. It creates a bunch of functions like this:

___x86.get_pc_thunk.bx:
        mov (%esp), %ebx
        ret

which move the return address to the indicated register. This is a special function that does not follow the normal calling convention, one exists for each of eax, ebx, ecx, edx, esi, and edi, depending on which register gcc wants to use. The code looks like this:

        call ___x86.get_pc_thunk.bx # ebx = Label
Label:  mov foo-Label(%ebx), %eax   # eax = *foo
        mov bar-Label(%ebx), %ecx   # ecx = *bar

gcc uses this code for better performance on CPUs whose return prediction does not account for this fake-call idiom. I don't know which CPUs are actually affected though.

Note lastly that no label is skipped over. I don't quite understand what you mean with respect to blobbbb. Which control is supposed to reach this label?

Finally, your example should look like this:

        .text
        .global _main

_main:  call Label                  # push %eip
Label:  pop %eax                    # eax = Label
        mov var1-Label(%eax), %eax  # eax = *(Label+var1-Label)
        ret


        .data
var1:   .long 6

Note that the .end directive is never needed. Labels that begin with a capital L are local labels that do not end up in the symbol table which is why the C compiler likes to use them.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
fuz
  • 88,405
  • 25
  • 200
  • 352
  • That would explain no direct references to 32bit PIE, only 64bit with RIP register. Great! Re the skipped label, I expected labels to be ordered one after another in the address space, therefore with that dummy label I was expecting `var1` - `test` to return the address of `blobbbb` i.e. fail somehow to retrieve the value of var1. – Peter Badida Sep 15 '18 at 15:45
  • @KeyWeeUsr Yes, addresses are linear. Why do you expect the subtraction operator to yield anything but the difference between `var1` and `test` if you write `var1-test`? – fuz Sep 15 '18 at 15:49
  • I thought they are ordered i.e. _main, test, blobbbb, var1, var2, therefore if I get the upper section (var1) and remove from it the lower section (test) while blobbbb is present, I'd expect blobbbb address yielded. Perhaps I'm still not getting something though. – Peter Badida Sep 15 '18 at 15:51
  • 1
    @KeyWeeUsr That's not how differences work. An address is just a number. If you have a number `foo` and add to `foo` the value `bar - foo` you get `foo + bar - foo` which is just `bar`. That's all. No tricks, just simple adding and subtracting. And yes, the labels are ordered in memory. But I don't quite get your logic. – fuz Sep 15 '18 at 15:54
  • 4
    Re: thunks: it was widely believed that `call +0` / `pop` would unbalance the return-address predictor on most CPUs. IIRC, Agner Fog's guides even made that claim. But http://blog.stuffedcow.net/2018/04/ras-microbenchmarks/#call0 actually measured, and it turns out only Via Nano doesn't special-case call/pop. Probably enough existing code used the call/pop idiom that it was worth special-casing for CPU vendors. Calling the thunk is 1 byte shorter than inlining call/pop, so in a large library it possibly amounts to some total code-size savings. – Peter Cordes Sep 15 '18 at 17:55