9

I write the below assembler code, and it can build pass by as and ld directly.

as cpuid.s -o cpuid.o
ld cpuid.o -o cpuid

But when I used gcc to do the whole procedure. I meet the below error.

$ gcc cpuid.s -o cpuid
/tmp/cctNMsIU.o: In function `_start':
(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here
/usr/bin/ld: /tmp/cctNMsIU.o: relocation R_X86_64_32 against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/usr/bin/ld: final link failed: Invalid operation
collect2: error: ld returned 1 exit status

Then I modify _start to main, and also add -fPIC to gcc parameter. But it doesn't fix my ld error. the error msg is changed to below.

$ gcc cpuid.s -o cpuid
/usr/bin/ld: /tmp/ccYCG80T.o: relocation R_X86_64_32 against `.data' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

I don't understand the meaning for that due to I don't make a shared object. I just want to make an executable binary.

    .section .data
output:

    .ascii "The processor Vendor ID is 'xxxxxxxxxxxx'\n"

    .section .text

    .global _start

_start:

    movl $0, %eax

    cpuid

    movl $output, %edi
    movl %ebx, 28(%edi)
    movl %edx, 32(%edi)
    movl %ecx, 36(%edi)

    movl $4, %eax
    movl $1, %ebx
    movl $output, %ecx
    movl $42, %edx
    int $0x80

    movl $1, %eax
    movl $0, %ebx
    int $0x80

If i modify the above code to below, whether it is correct or having some side effect on 64bit asm programming ?

         .section .data
 output:
         .ascii "The processor Vendor ID is 'xxxxxxxxxxxx'\n"

         .section .text

         .global main
 main:
         movq $0, %rax
         cpuid

         lea output(%rip), %rdi
         movl %ebx, 28(%rdi)
         movl %edx, 32(%rdi)
         movl %ecx, 36(%rdi)
         movq %rdi, %r10

         movq $1, %rax
         movq $1, %rdi
         movq %r10, %rsi
         movq $42, %rdx
         syscall
Miracle Huang
  • 103
  • 1
  • 2
  • 7
  • 8
    Your compiler is configured for PIE by default. Use `gcc -no-pie`. Also, you seem to be creating 64 bit output, but your code is 32 bit, you may run into problems with that later. Recommend you add `-m32` too. – Jester Mar 22 '18 at 17:07
  • 1
    Why gcc is making your executable a shared object by default: https://stackoverflow.com/questions/43367427/32-bit-absolute-addresses-no-longer-allowed-in-x86-64-linux. The "use `-fPIC`" suggestion only makes sense for compiler-generated code; in your case it means rewrite the asm by hand as position-independent code. (Or instead build with `gcc -no-pie -m32`). – Peter Cordes Mar 22 '18 at 17:13
  • @Jester could u help to let me know what is the actually problems? if i used rax,rbx,rcx, and rdi instead of the above register. Is that right ? due to i am new commer for ia32 – Miracle Huang Apr 11 '18 at 05:37
  • `movl $output, %rcx` shouldn't even assemble, because a 64-bit register doesn't match the `l` operand-size suffix. If you want to make IA-32 code instead of x86-64, build with `gcc -m32`. You're using [the 32-bit `int $0x80` ABI, so that's another sign you didn't intend to write 64-bit code](https://stackoverflow.com/questions/46087730/what-happens-if-you-use-the-32-bit-int-0x80-linux-abi-in-64-bit-code). You don't need `movl $1, %rbx` that wastes space vs. `movl $1, %ebx`. writing a 32-bit register zero-extends into the 64-bit register. – Peter Cordes Apr 11 '18 at 06:33
  • Just make 32-bit position-dependent code so you can keep using `mov $output`. I disagree with @R.'s claim that PIC would be better for 32-bit code; it's inconvenient and makes your code significantly slower, as well as being more complicated (and thus bad for a beginner). Only x86-64 supports it naturally, with RIP-relative addressing like `lea output(%rip), %rsi` – Peter Cordes Apr 11 '18 at 06:37

2 Answers2

7

As comments have noted, you could work around this by linking your program as non-PIE, but it would be better to fix your asm to be position-independent. If it's 32-bit x86 code that's a bit ugly. This instruction:

    movl $output, %edi

would become:

    call 1f
1:  pop %edi
    add $output-1b, %edi

for 64-bit it's much cleaner. Instead of:

    movq $output, %rdi

you'd write:

    lea output(%rip), %rdi
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • `call`/`pop` is not recommended, it breaks branch-prediction for up to 16 future `ret` instructions. (back up the call tree). The real answer here is that this is 32-bit code using the 32-bit `int $0x80` ABI, so you should definitely *not* make it PIC unless you really need to, i.e. to put it in a shared library. If it was 64-bit code I would have marked it a duplicate of https://stackoverflow.com/questions/43367427/32-bit-absolute-addresses-no-longer-allowed-in-x86-64-linux. – Peter Cordes Mar 22 '18 at 17:59
  • @PeterCordes: I disagree. This is obviously an asm learning exercise, and anyone writing asm in 2018 should be learning to write it in a position-independent manner, always. Whether the call/pop approach breaks branch prediction on modern microarchitectures is a question i don't know the answer to right off, but if you don't like it you can always replace it with a call/mov/ret approach. In any case, if you're making syscalls, the cost is dominated by the syscall anyway. – R.. GitHub STOP HELPING ICE Mar 22 '18 at 18:05
  • Personally, I would write the call/pop idiom anywhere that happens a small/bounded number of times per program iteration or per "big expensive operation", simply because it's spatially isolated and obvious, and only bother with something that could perform better if it's being used where performance may matter. – R.. GitHub STOP HELPING ICE Mar 22 '18 at 18:07
  • Huh? One of the major reasons for writing asm by hand is performance. Not taking advantage of position-dependent executables to gain performance doesn't make sense, because it is an option in many cases. (re: prediction: https://stackoverflow.com/questions/22442766/return-address-prediction-stack-buffer-vs-stack-stored-return-address, and the canonical Q&A recommends call/mov/ret: https://stackoverflow.com/questions/599968/reading-program-counter-directly.) But yeah ideally avoid doing it deep in the call tree, or frequently, if for some reason you do need 32-bit PIC code. – Peter Cordes Mar 22 '18 at 18:09
  • @PeterCordes: Writing standalone asm files (instead of inline asm that can be integrated into the ast) for performance is generally a backwards practice. If you're writing inline asm there is no reason to do pic manually; pass the address operands in to the asm block. Yes, some people will still do things the backwards way anyway, and in that case they should research the most efficient form of PC-relative addressing for the relevant cpu models they'll be targeting. They should not write PIC-incompatible code. My automatic reaction to finding pic-incompatible asm is `--disable-asm`. – R.. GitHub STOP HELPING ICE Mar 22 '18 at 18:15
  • FWIW most of the asm I write/use is for non-performance purposes, rather for things that are not expressible in C. Things like entry points and early state setup, task switching/scheduling, userspace context switching, ... – R.. GitHub STOP HELPING ICE Mar 22 '18 at 18:17
  • If you write a whole loop in pure asm, it's not bad to write the whole function in asm too. Using inline-asm instead of stand-alone doesn't make *that* big a difference, especially not when you need separate functions anyway for runtime dispatching based on supported instruction set. x264 (the open source h.264 video encoder) for example has [all of its asm](http://git.videolan.org/?p=x264.git;a=tree;f=common/x86;hb=HEAD) written stand-alone, and called through function pointers that are set based on CPUID results to pick the best version for the CPU. – Peter Cordes Mar 22 '18 at 18:26
2

With NASM I fixed this by putting the line "DEFAULT REL" in the source file (check nasmdoc.pdf p.76).

Michael
  • 21
  • 1