1

Having this in gas:

    .text
    .globl main
main:
    xor %eax, %eax
    lea str(%rip), %rdi
    call printf
    call exit

str: .byte 0x7F, "ELF", 1,1,1,0

I thought the .byte directive could be concatenate as in nasm

db      0x7F, "ELF", 1, 1, 1, 0         ;   e_ident

source : http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
autistic456
  • 183
  • 1
  • 10
  • 1
    Note that if you want this to actually run, you have to align the stack before calling `printf`. Add any `push` instruction before the call. – Nate Eldredge Jun 08 '20 at 12:27
  • @NateEldredge the alignment is just convention. It would be an issue, if the program would be complex, but since there is calling one function and then another one, it is not life-essential to make alignment, it sill works fine – autistic456 Jun 08 '20 at 12:33
  • 1
    current glibc `printf` happens not to crash when you violate the ABI, but fun fact: `scanf` does fault, even with no FP args. [glibc scanf Segmentation faults when called from a function that doesn't align RSP](https://stackoverflow.com/q/51070716). Functions are allowed to depend on ABI guarantees for correctness, so it's a bad idea to violate them. You can for toy examples when it happens to work, but be aware of what you're doing. – Peter Cordes Jun 08 '20 at 12:35
  • 2
    It's not just convention. `printf` and other library functions can, and somtimes do, execute instructions that only work if the stack is aligned. It may happen to work right now by luck, with this particular set of arguments to `printf` and your particular libc built with your particular compiler, but it may break any time. Stack alignment isn't optional when calling C functions. – Nate Eldredge Jun 08 '20 at 12:35

1 Answers1

2

In GAS syntax, "ELF" is a symbol reference to the symbol name ELF, not a multi-char string. In the context of .byte directive, it's only looking for a number, not a possible string.

And since you used it as one element of a list of .byte values, it's asking for the low byte of the absolute address, hence the .._8 relocation. The meaning is totally different from NASM's db.

In GAS when it's expecting a number, 'E' is allowed as an ASCII constant, but "E" isn't. e.g. mov $"E", %eax will give you a R_X86_64_32 E relocation.

Single quotes don't work either. A single-character literal does work as a number, e.g. as an immediate like mov $'a', %eax. But unlike NASM, GAS doesn't support multi-character character literals. So mov eax, 'Hey!' works in NASM, but mov $'Hey!', %eax doesn't work in GAS.

AFAIK, GAS only lets you use a sequence of multiple ASCII characters as literal data for a .ascii / .asciz directive, or the related .string / .string16 / .string32 narrow or wide character directives. (GAS manual)


You have a few options:

str: .byte 0x7F
     .ascii "ELF"         # separate directives
     .byte 1,1,1,0
str: .byte 0x7F, 'E', 'L', 'F', 1,1,1,0   # separate character literals
str: .asciz "\x7F\ELF\x1\x1\x1"         # hex escapes in a string

\E stops the whole 7FE from being seen as one hex number. Without the extra backslash, it assembles to fe 4c 46 01... (bad) instead of the desired 7f 45 4c 46 01... (good).

IDK if there's a better / cleaner way to do that; maybe 3-digit octal escape sequences?


That tutorial uses NASM's flat binary output mode to manually create ELF program headers (for a 32-bit executable). I guess you're trying to create a 64-bit program that prints that output, for some reason? It happens not to contain any 0 or % bytes, so yes you can output it with printf.

A more direct way to port the tutorial to GAS syntax would be to use ld to link into as output into a flat binary. How to generate plain binaries like nasm -f bin with the GNU GAS assembler?

Or use objcopy to copy the .text section of a .o or executable into a flat binary. Make sure everything is in the .text section if you use objcopy.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Peter Corders, actually the only intend is to port it in gas syntax, but no on x64 (I will remain the defaul tutorial target x32). But anyway, from the above, why `"x7F\ELF"`, tranlsates to `fe 4c 46`, I understand the hex `46`(F) and `4c`(L), but why is `fe`? Where is the `7`? I don't get how escaped hex change the final output. From what you wrote, it seems, that \xn will remain its value and won't translate. But it must translate, since it is ascii string, so to what character will translate the `fe` byte, when this is not an entry in ascii? (Max value in ascii is `0x7F -> DEL` – autistic456 Jun 08 '20 at 11:58
  • 1
    @autistic456: I think `\x7FE` gets parsed as one hex number, 0x7FE, then truncated to `0xFE` to fit in a byte. That's consistent with the observed behaviour. The letter `L` isn't a hex digit so parsing stops there. Just like the backslash stops the number parsing even though the `\E` is still just a literal `E` after wards. (BTW, updated that paragraph in my answer; I stated it backwards the first try.) GAS isn't as user-friendly as NASM; it was designed primarily to assemble compiler output. – Peter Cordes Jun 08 '20 at 12:07
  • Another question is this part `mov $'a', %eax`. I have tried to implement it : `.text .globl main main: xor %eax, %eax mov $'a', %rdi call printf call exit` , but `command terminated` – autistic456 Jun 08 '20 at 12:22
  • @autistic456: IDK why you think that would work. In C, `printf('a')` wouldn't work either, and that's what you're implementing. You need to pass a pointer to a string, not an integer character value. Or `call putchar` instead. – Peter Cordes Jun 08 '20 at 12:23
  • ok, but If I tried, `lea $'a', %rdi`, then mismatch for `lea` – autistic456 Jun 08 '20 at 12:24
  • @autistic456: Right, again, of course it did. [LEA](https://www.felixcloutier.com/x86/lea) can't take an immediate operand. You seem to be thinking about assembly language like a high-level compiled language, where you can write a string literal in various contexts and the compiler will implement that anonymous temporary object somehow. But in asm, the limit of what one instruction can do is defined by the machine code. Asm is just a syntax for expressing single instructions. – Peter Cordes Jun 08 '20 at 12:27
  • and lastly - you suggest to use `objcopy` to port to flat binary. But how this is connected with the assemblty syntax source? I am thinink about write it in nasm, compile it with nasm, then `objdump` and rewrite it in gas. `objcopy` is here only for binary format, but not assembly source – autistic456 Jun 08 '20 at 12:27
  • but is it so hard to make an address for a immediate and then take that address? You say `lea` cannot take immediate, becuase it has no address, only symbol has, but it is not that hard to store the immediate on a say `.data` section and make address for it, upon which the `lea` will work – autistic456 Jun 08 '20 at 12:29
  • @autistic456: The point of that tutorial / article is to control every byte in an output file. But GAS doesn't have a flat-binary output mode, so a port of the final NASM version would probably best be done by putting all those directives and instructions in the `.text` section. Then make the binary by assembling it to a `.o` (and maybe link to an executable), and then use `objcopy` to extract the bytes you're in control of (the .text section) into a flat binary. – Peter Cordes Jun 08 '20 at 12:30
  • @autistic456: Sure, you could write a macro that uses `.pushsection .rodata` ; `tmp_label .asciz \0` ; `.popsection` and then emits an `lea` instruction. (There are tricks to get unique label names if the same macro is used repeatedly). But GAS isn't going to do that for you!! Like I said, asm is just a syntax for expressing single machine instructions. – Peter Cordes Jun 08 '20 at 12:32
  • I do not understand why `as` does not have flag to specify output binary format. In the tutorial the used `$ nasm -f bin -o a.out tiny.asm $ chmod +x a.out $ ./a.out ; echo $? 42 `, which mean NASM has `-f` to do that thing. I am seying this, because I do not fully understand the process of getting gas-syntax source. Firstly I would have to undestand both assemblers to make converstion between them. Once understood, write all instruction on `.text` in gas, and then make binary via the `objcopy` as you suggets? Is thaw what you meant? – autistic456 Jun 08 '20 at 12:41
  • @autistic456: IDK why `as` doesn't have a flat-binary output mode, but yes that's why you need `objcopy`. Oh, `ld` does have a flat-binary output mode so you can just link into a flat binary without needing `objcopy`. I found a better Q&A about that: [How to generate plain binaries like nasm -f bin with the GNU GAS assembler?](https://stackoverflow.com/q/6828631). Presumably `as` doesn't do this because compilers don't need it, and/or it simplifies `as` to leave some stuff to the linker. e.g. detecting undefined symbol references. – Peter Cordes Jun 08 '20 at 12:47
  • It's really a question of design philosophy between NASM (intended for hand-written asm, including for bootloaders and other flat binaries), vs. GAS (intended for compiler output as part of GNU binutils, and part of the GNU toolchain in general) – Peter Cordes Jun 08 '20 at 12:47
  • And - why even does not have `as` mode for flat binary? I would expect it should, since how old `as` is a for legacy reasons. But nothing of these is true. Why? – autistic456 Jun 08 '20 at 12:48
  • Ok, if I take design philosophy apart, and just consider the age of these two assemblers, then in a time gas was written, the flat binaries were more common? (i would expect), and that firstly, there was a `a.out` format which derived from flat format? (again, dont know it, jus assume), or why does MS has .com format, that si so close to flat format, but unix does not have such? I am interested in the legacy of it and evolution, rather then design philosophy – autistic456 Jun 08 '20 at 12:57
  • @autistic456: No, flat binaries have never been common on Unix. Always some metadata / magic number so the OS can distinguish a binary executable from a `#!/bin/whatever` script, not by filename like `.com` or `.exe`. You can make a flat binary with `as` + `ld` and/or `objcopy`, all part of the same GNU Binutils package. Unix tools are often designed with separate programs for specific tasks, so even less reason for `as` itself to do it. Side note: DOS `.com` format is a flat binary, so assemblers portable to Windows / DOS (including NASM) have more reason to natively support flat binaries. – Peter Cordes Jun 08 '20 at 13:11
  • you still did not explain the evolution of unix binary formats. I've mentioned a `a.out` format, which I assume as a ancient one, but at the beginning of it. In time unix was being create in at&t labs, upon which had Ritchie decided to chose binary format? What was before *that* format and how is it connectes with the at&t assembler made for them (formats). The dos .com format was only mentioned for comparison, though it would be also interesting to mention, apon which binary format has Gates decided as well. – autistic456 Jun 08 '20 at 13:34
  • @autistic456: I'm pretty sure Unix has never used flat binaries. Or if so, it had moved on to formats with magic numbers by the time the GNU project began. I didn't try to explain history of object / executable file formats because that's irrelevant other than not being flat binaries that just get loaded into one contiguous region of memory. An OS that can have more than one process running at a time is always going to want to distinguish text / rodata from data so text can be shared between processes running the same executable. And having a BSS is sensible with protected memory. – Peter Cordes Jun 08 '20 at 13:43