0

I'm trying to add a function S_0x804853E in an assembly file compiled by GCC. And i'm trying to assemble the file to execuable file. The complete assembly file is followed.

    .file   "simple.c"
    .intel_syntax noprefix
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    push    ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    mov ebp, esp
    .cfi_def_cfa_register 5
    sub esp, 16
    call    __x86.get_pc_thunk.ax
    add eax, OFFSET FLAT:_GLOBAL_OFFSET_TABLE_
    mov DWORD PTR -4[ebp], 3
    mov eax, 0
    leave
    call S_0x804853E # note that this line is manually added.
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .section    .text.__x86.get_pc_thunk.ax,"axG",@progbits,__x86.get_pc_thunk.ax,comdat
    .globl  __x86.get_pc_thunk.ax
    .hidden __x86.get_pc_thunk.ax
    .type   __x86.get_pc_thunk.ax, @function
__x86.get_pc_thunk.ax:
.LFB1:
    .cfi_startproc
    mov eax, DWORD PTR [esp]
    ret
    .cfi_endproc
.LFE1:
    .ident  "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
    .section    .note.GNU-stack,"",@progbits
# note that codes below are manually added.
.type   S_0x804853E, @function
S_0x804853E:
    push ebp
    mov esp,ebp
    push ebx
    sub $0x4,esp
    call S_0x80485BB
    add $_GLOBAL_OFFSET_TABLE_,eax
    sub $0xC,esp
    lea S_0x80486B8,edx
    push edx
    mov eax,ebx
    call puts
    add $0x10,esp
    nop
    mov -0x4(ebp),ebx
    leave
    ret
.type   S_0x80485BB, @function
S_0x80485BB:
    mov (esp),eax
    ret

.section .rodata

S_0x80486B8:
    .byte 0x36
    .byte 0x00

I'm using commands below to assemble. And Errors followed.

$ gcc  -m32 -no-pie -nostartfiles simple.s -o simple                                                                                                  
simple.s: Assembler messages:                                                                         
simple.s:49: Error: operand size mismatch for `lea'                                                   
simple.s:55: Error: junk `(ebp)' after expression

I'm not very familiar with assembly. Apologize if the problem can be easily solved by google. But i failed to find any related explanations. Thanks for your help.

S1mple
  • 35
  • 6
  • Please [edit] your question to add comments on the lines where you get the errors. – Some programmer dude Mar 12 '22 at 10:53
  • 1
    Your own code is in unprefixed AT&T syntax whereas you have configured the assembler to use Intel syntax. Issue a `.syntax att noprefix` directive to configure the syntax correctly. – fuz Mar 12 '22 at 11:03
  • 1
    Or much better, use normal AT&T syntax instead of a mutant hybrid that will look wrong to everyone else. clang's built-in assembler doesn't even support `.att_syntax noprefix`, only ATT-prefix and Intel-noprefix. – Peter Cordes Mar 12 '22 at 11:24
  • 1
    The other comments are basically saying *don't mix Intel and AT&T syntax*. Just stick to one syntax in the same assembly source file. – xiver77 Mar 12 '22 at 19:25
  • @fuz THANKS! But what is the unprefixed AT&T(I've never heard of that and can't get any descriptions from google)? Is there any introductions? What's the difference between normal AT&T and the unprefixed one? Or in general, how to convert unprefixed AT&T to normal one? – S1mple Mar 13 '22 at 08:57
  • @PeterCordes Sorry. The assembly code added was generated by a tool. I thought the code is Intel format when i noticed the instructions (such as `mov`, `add` ) don't have suffix(such as `'b','l','w','q'`). I'm trying to convert unprefixed AT&T codes to normal ones. Are there any ways? Thanks! – S1mple Mar 13 '22 at 09:06
  • When a register implies the operand-size, it's normal to omit it from the mnemonic. Just like in Intel syntax, you don't write `mov dword eax, dword ecx`, you just write `mov eax, ecx`. Only needing an operand-size specifier for movzx from memory, or instructions with no register operands, like `add byte ptr [rdi], 1`. – Peter Cordes Mar 13 '22 at 13:17
  • 1
    To convert att noprefix to normal AT&T, just put a `%` at the start of every register name. Or if you really have asm output like that from some other tool, then just use `.att_syntax noprefix` instead of messing around with it. – Peter Cordes Mar 13 '22 at 13:19

1 Answers1

0

The main problem is that i mixed up the grammar of intel and AT&T. The codes generated from the tool are AT&T without operator suffix('b','l','w','q'). Compiling C code to AT&T and making up the operator suffix make sense. edited codes followed.

    .file   "simple.c"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushl   %ebp
    .cfi_def_cfa_offset 8
    .cfi_offset 5, -8
    movl    %esp, %ebp
    .cfi_def_cfa_register 5
    subl    $16, %esp
    call    __x86.get_pc_thunk.ax
    addl    $_GLOBAL_OFFSET_TABLE_, %eax
    movl    $3, -4(%ebp)
    movl    $0, %eax
    leave
    call S_0x804853E # note that this line is mannally added
    .cfi_restore 5
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .section    .text.__x86.get_pc_thunk.ax,"axG",@progbits,__x86.get_pc_thunk.ax,comdat
    .globl  __x86.get_pc_thunk.ax
    .hidden __x86.get_pc_thunk.ax
    .type   __x86.get_pc_thunk.ax, @function
__x86.get_pc_thunk.ax:
.LFB1:
    .cfi_startproc
    movl    (%esp), %eax
    ret
    .cfi_endproc
# note that codes below are mannally added
.type   S_0x804853E, @function
S_0x804853E:
    pushl %ebp
    movl %esp,%ebp
    pushl %ebx
    subl $0x4,%esp
    call S_0x80485BB
    addl $_GLOBAL_OFFSET_TABLE_,%eax
    subl $0xC,%esp
    lea S_0x80486B8,%edx
    pushl %edx
    movl %eax,%ebx
    call puts
    addl $0x10,%esp
    nop
    movl -0x4(%ebp),%ebx
    leave
    ret
.type   S_0x80485BB, @function
S_0x80485BB:
    movl (%esp),%eax
    ret

.section .rodata

S_0x80486B8:
    .byte 0x36
    .byte 0x00

Codes can be assembled by gcc without warnings and errors.

-------------------------split line for new edit----------------------

Thanks for help from @Peter Cordes. It's unnecessary to explictly give all instructions the operand-size suffix. We use suffix only if the operand size of the instuction seems ambiguous without the declaration of size. EX:neither operand is a register.

movl $4, -4(%ebp)
S1mple
  • 35
  • 6
  • `sub $0x4,%esp` is perfectly fine. ESP being a 32-bit register implies the `l` size, you don't need to make it explicit. Only an instruction like `movl $4, -4(%ebp)` needs an operand-size override / specifier to avoid ambiguity (because neither operand is a register.) – Peter Cordes Mar 14 '22 at 00:34
  • Also, if you're going to use instructions like `lea S_0x80486B8,%edx` with an absolute address in the machine code, you might as well compile with `-no-pie -fno-pie`. Although in 32-bit code, runtime fixups do allow absolute addresses even in a shared object, so if you want a PIE then maybe keep the `-fPIE` code-gen option so there are fewer runtime fixups. Or not so you get lots of runtime fixups but more efficient code once its loaded and relocated. – Peter Cordes Mar 14 '22 at 00:38
  • @PeterCordes eh...I don't think it's an absolute address. `S_0x80486B8` is just a label(can be replaced by anything else as a label). The label is generated by the tool i've mentioned before. It disassemble binaries and generate complete assembly code. The address is referred to the input binary `.text` segment. Whatever, I've learned a lot and thanks for your help! – S1mple Mar 14 '22 at 01:28
  • The only way that `lea` can assemble is to *machine code* with an absolute address. Unlike in the compiler-generated code earlier in the same code block that does `call __x86.get_pc_thunk.ax` / `addl $_GLOBAL_OFFSET_TABLE_, %eax` to get the address of the GOT based on the current address it's running from. (Then from there it would use `mov foo@GOTOFF(%ebx), %eax` to load, like https://godbolt.org/z/Ks7e3h7rs. Or if you compile with `-fno-pie`, it can just use `mov foo, %eax`). I'm *not* saying you hard-coded an absolute address in the asm *source*, just talking about how it assembles. – Peter Cordes Mar 14 '22 at 01:44
  • (Related: [32-bit absolute addresses no longer allowed in x86-64 Linux?](https://stackoverflow.com/q/43367427) re: PIEs, but 32-bit code doesn't have RIP-relative, and 32-bit is "full width" so a load-time relocation can handle the case where the machine code uses absolute addresses.) – Peter Cordes Mar 14 '22 at 01:44