Is it possible for a program to read itself?

Question

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.

Would it be something like

jmp labelx

And then would i then use the Write syscall , making sure to read from the next instruction from labelx:

mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall

to then output to stdout.

However how would I obtain the size to read itself? Especially if there is a label after the code i want to read or code after. Would I have to manually count the lines?

mov rdx,rip+(bytes*lines)

And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.

Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?

I'm using NASM syntax btw. And pretty new to assembly. And just a question.

Could you clarify what you mean by "the size"? And what do you want to print? The binary machine code or the corresponding assembler instructions? — chtz, Feb 11 '22 at 14:55
The `.text` section is just bytes in memory, no different from `section .rodata` where you might normally put `msg: db "hello", 10`. You can get the assembler to calculate the size by putting label at the start/end, and using `mov edx, prog_end - prog_start` inside the code at the point where you want that size in RDX. See [How does $ work in NASM, exactly?](https://stackoverflow.com/q/47494744) for more about subtracting two labels to get a size. — Peter Cordes, Feb 11 '22 at 15:01
@chtz The bytes of the instructions themselves. And essentially a small section of code, which has x amount of lines, kinda like a function. — Sam, Feb 11 '22 at 15:13

score 2 · Accepted Answer · answered Feb 11 '22 at 19:08

Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.

You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.

See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)

To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.

here:
    lea rsi, [rel here]     ; with DEFAULT REL you could just use [here]
    mov edi, 1              ; stdout fileno
    mov edx, .end - here    ; assemble-time constant size calculation
    mov eax, 1              ; __NR_write
    syscall
.end:

This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)

The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.

I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).

Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

For label arithmetic like your `.end - here` example I like to use a NASM-style local label that's just a dot to indicate the start of a structure. For example, [here's a `.:` label](https://hg.pushbx.org/ecm/ldebug/file/c2d96de77120/source/msg.asm#l1484) and a few lines below [there's the user of it](https://hg.pushbx.org/ecm/ldebug/file/c2d96de77120/source/msg.asm#l1497). — ecm, Feb 11 '22 at 20:25
@ecm: Not sure if I like `.` specifically (e.g. GAS uses `.` the way NASM uses `$` so it could be confusing to some readers). But yeah probably good to use a generically-named local label at the start, so its `.end - .start` instead of `.end - nonlocal` that will break if you transplant it to somewhere else or change the nonlocal name. — Peter Cordes, Feb 11 '22 at 20:32

Is it possible for a program to read itself?

1 Answers1