0

I'm using NASM version 2.14.02 and GNU ld 2.34 to compile an assembly file (for example, hello world) on a 64-bit Linux.

I would like (just for fun basically) to produce an executable file of the smallest possible size. However, in the executable file produced by the utilities, there are some strings that are definitely meaningless for the executable (like the name of the source file, section names, and some others). How do I get rid of them?

Here is what I do:

$ cat hello_world_32.s
SECTION .rodata
    msg:        db 'Hello world!',0xA
    msg_len:    equ $-msg

SECTION .text
    global _start

_start:
    mov eax, 4
    mov ebx, 1
    mov ecx, msg
    mov edx, msg_len
    int 0x80

    mov eax, 1
    xor ebx, ebx
    int 0x80
$ nasm -f elf32 -o hello_world.o hello_world_32.s
$ ld --nmagic -m elf_i386 -o hello_world hello_world.o
$ ./hello_world              
Hello world!
$ grep hello_world_32.s hello_world
Binary file hello_world matches
$ grep .text hello_world
Binary file hello_world matches
$ grep .rodata hello_world
Binary file hello_world matches
$

Here is the output of xxd hello_world:

00000000: 7f45 4c46 0101 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 0300 0100 0000 6080 0408 3400 0000  ........`...4...
00000020: 9001 0000 0000 0000 3400 2000 0100 2800  ........4. ...(.
00000030: 0600 0500 0100 0000 6000 0000 6080 0408  ........`...`...
00000040: 6080 0408 2d00 0000 2d00 0000 0500 0000  `...-...-.......
00000050: 1000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: b804 0000 00bb 0200 0000 b980 8004 08ba  ................
00000070: 0d00 0000 cd80 b801 0000 0031 dbcd 8000  ...........1....
00000080: 4865 6c6c 6f20 776f 726c 6421 0a00 0000  Hello world!....
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 6080 0408 0000 0000 0300 0100  ....`...........
000000b0: 0000 0000 8080 0408 0000 0000 0300 0200  ................
000000c0: 0100 0000 0000 0000 0000 0000 0400 f1ff  ................
000000d0: 1200 0000 8080 0408 0000 0000 0000 0200  ................
000000e0: 1600 0000 0d00 0000 0000 0000 0000 f1ff  ................
000000f0: 2300 0000 6080 0408 0000 0000 1000 0100  #...`...........
00000100: 1e00 0000 8d90 0408 0000 0000 1000 0200  ................
00000110: 2a00 0000 8d90 0408 0000 0000 1000 0200  *...............
00000120: 3100 0000 9090 0408 0000 0000 1000 0200  1...............
00000130: 0068 656c 6c6f 5f77 6f72 6c64 5f33 322e  .hello_world_32.
00000140: 7300 6d73 6700 6d73 675f 6c65 6e00 5f5f  s.msg.msg_len.__
00000150: 6273 735f 7374 6172 7400 5f65 6461 7461  bss_start._edata
00000160: 005f 656e 6400 002e 7379 6d74 6162 002e  ._end...symtab..
00000170: 7374 7274 6162 002e 7368 7374 7274 6162  strtab..shstrtab
00000180: 002e 7465 7874 002e 726f 6461 7461 0000  ..text..rodata..
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 1b00 0000 0100 0000  ................
000001c0: 0600 0000 6080 0408 6000 0000 1f00 0000  ....`...`.......
000001d0: 0000 0000 0000 0000 1000 0000 0000 0000  ................
000001e0: 2100 0000 0100 0000 0200 0000 8080 0408  !...............
000001f0: 8000 0000 0d00 0000 0000 0000 0000 0000  ................
00000200: 0400 0000 0000 0000 0100 0000 0200 0000  ................
00000210: 0000 0000 0000 0000 9000 0000 a000 0000  ................
00000220: 0400 0000 0600 0000 0400 0000 1000 0000  ................
00000230: 0900 0000 0300 0000 0000 0000 0000 0000  ................
00000240: 3001 0000 3600 0000 0000 0000 0000 0000  0...6...........
00000250: 0100 0000 0000 0000 1100 0000 0300 0000  ................
00000260: 0000 0000 0000 0000 6601 0000 2900 0000  ........f...)...
00000270: 0000 0000 0000 0000 0100 0000 0000 0000  ................

How do I get rid of the unneeded strings in the executable? Is there, probably, some way to only compile instructions, ignoring all the rest?

Kolay.Ne
  • 1,345
  • 1
  • 8
  • 23
  • _"only compile instructions"_ yes, but then it won't run :D Anyway, you can get rid of section names and symbols. – Jester Sep 06 '21 at 19:26
  • 1
    Have you tried the `-s` switch for the linker? – PMF Sep 06 '21 at 19:28
  • 1
    `nasm -fbin` makes a "flat binary" output, with no ELF metadata wrapped around it. If you manually define ELF headers with `db` in your .asm file (and tuck some machine code inside some don't-care fields), you can make a very small static ELF binary. https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html – Peter Cordes Sep 06 '21 at 19:30
  • I assume you're including `db 'Hello world!',0xA` as a (pseudo)-instruction? Without that, your Hello World program won't have any data in memory it wants to pass to a write system call. – Peter Cordes Sep 06 '21 at 19:32
  • @PMF, it made things better (filename and "variables"' names have disappeared, but there still are section names as raw text – Kolay.Ne Sep 06 '21 at 19:39
  • 1
    I'm not an expert on elf format, but I would say they're required for the program to be loaded correctly. – PMF Sep 06 '21 at 19:40
  • @PMF, I believe sections are required, but I doubt the **raw text** denoting them is... – Kolay.Ne Sep 06 '21 at 19:43
  • 1
    Sections are not required, program segments are. But missing sections make stock tools unhappy. – Jester Sep 06 '21 at 19:44
  • @PeterCordes, thank you for the link, I'll read it through and follow up with the results. Although it seems a bit strange for me if there is no built-in option for such purpose in `nasm`/`ld`. – Kolay.Ne Sep 06 '21 at 19:45
  • @PeterCordes, I don't understand your question about Hello world. Yes, this string must be in the produced executable file, but there are other strings in the executable – Kolay.Ne Sep 06 '21 at 19:46
  • 1
    "string *constants* from assembly source" sounds like it's talking about the strings explicitly defined in your source file. Not metadata sections added by the assembler, which are strings but wouldn't normally be called string constants, and they're not from the asm source file. You also said "only compile instructions" which again seems to imply only assembling the machine code in .text, not data. We can tell from context that's not what you want, but that's what I was replying to. (Literally only the instructions would imply no metadata at all, flat binary, like Jester said wouldn't run) – Peter Cordes Sep 06 '21 at 20:34
  • 2
    In programming, being precise about what exactly you want is very important, especially when you're telling the computer what to do. It's also a useful thing to do when communicating with other human programmers, who tend to notice exactly what you said. – Peter Cordes Sep 06 '21 at 20:35
  • 2
    Anyway, you might want to try FASM; it can output a full ELF executable (not just a .o), without any sections *just* the ELF program headers. So it runs but it's a pain to use GDB on it. But it is small. – Peter Cordes Sep 06 '21 at 20:36
  • 1
    @PeterCordes, thank you so much for the comments and the suggestion of FASM. I have checked it, it does produce a smaller executable with no raw-text metadata – Kolay.Ne Sep 07 '21 at 15:50

1 Answers1

-1

there are some strings that are definitely meaningless for the executable (like the name of the source file, section names, and some others).

Indeed, those strings are meaningless for the executable itself but they are essential for the loader of your program. Section and symbol names visible in the dump are not strings defined in the source text of your program, they are metadata required by the ELF specification. The names of program sections .data and .text are written in another special section called .shstrtab. This special section is discarded at load-time.

You could get rid of all metadata if you would have assembled with nasm -f bin instead of nasm -f elf32 but you wouldn't be able to execute your program with ./hello_world anymore. You would have to write another program (your own loader) which will allocate memory, copy the binary file to the memory and jump at its entry point.

vitsoft
  • 5,515
  • 1
  • 18
  • 31
  • 1
    Text section names are *not* relevant at all for the program to load, only for things like `objdump` and `gdb` to find the `.text` section. ELF section headers are separate from ELF **program** headers which define the program *segments* that the kernel maps into memory on `execve`. That's why the smallest executables at the end of https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html don't contain any `db` strings in their manually-assembled ELF header, and in fact don't have an ELF section header at all, like FASM's static-binary output mode. – Peter Cordes Sep 08 '21 at 07:24
  • [What's the difference of section and segment in ELF file format](https://stackoverflow.com/q/14361248). IDK if there's a way to strip section headers from an existing binary; perhaps with `strip` or `objcopy`? – Peter Cordes Sep 08 '21 at 07:27