2

The tutorial I am following is for x86 and was written using 32-bit assembly, I'm trying to follow along while learning x64 assembly in the process. This has been going very well up until this lesson where I have the following simple program which simply tries to modify a single character in a string; it compiles fine but segfaults when ran.

section .text

global _start ; Declare global entry oint for ld
_start:

    jmp short message ; Jump to where or message is at so we can do a call to push the address onto the stack

    code:   
    xor rax, rax    ; Clean up the registers
    xor rbx, rbx
    xor rcx, rcx
    xor rdx, rdx

    ; Try to change the N to a space
    pop rsi ; Get address from stack
    mov al, 0x20 ; Load 0x20 into RAX
    mov [rsi], al; Why segfault?
    xor rax, rax; Clear again

    ; write(rdi, rsi, rdx) = write(file_descriptor, buffer, length)
    mov al, 0x01    ; write the command for 64bit Syscall Write (0x01) into the lower 8 bits of RAX
    mov rdi, rax    ; First Paramter, RDI = 0x01 which is STDOUT, we move rax to ensure the upper 56 bits of RDI are zero
    ;pop rsi        ; Second Parameter, RSI = Popped address of message from stack
    mov dl, 25  ; Third Parameter, RDX = Length of message
    syscall     ; Call Write

    ; exit(rdi) = exit(return value)    
    xor rax, rax    ; write returns # of bytes written in rax, need to clean it up again
    add rax, 0x3C   ; 64bit syscall exit is 0x3C
    xor rdi, rdi    ; Return value is in rdi (First parameter), zero it to return 0
    syscall     ; Call Exit

    message:
    call code   ; Pushes the address of the string onto the stack
    db 'AAAABBBNAAAAAAAABBBBBBBB',0x0A

This culprit is this line:

mov [rsi], al; Why segfault?

If I comment it out, then the program runs fine, outputting the message 'AAAABBBNAAAAAAAABBBBBBBB', why can't I modify the string?

The authors code is the following:

global _start


_start:
        jmp short ender

        starter:

        pop ebx                 ;get the address of the string
        xor eax, eax

        mov al, 0x20
        mov [ebx+7], al        ;put a NULL where the N is in the string

        mov al, 4       ;syscall write
        mov bl, 1       ;stdout is 1
        pop ecx         ;get the address of the string from the stack
        mov dl, 25       ;length of the string
        int 0x80

        xor eax, eax
        mov al, 1       ;exit the shellcode
        xor ebx,ebx
        int 0x80

        ender:
        call starter
        db 'AAAABBBNAAAAAAAABBBBBBBB'0x0A

And I've compiled that using:

nasm -f elf <infile> -o <outfile>
ld -m elf_i386 <infile> -o <outfile>

But even that causes a segfault, images on the page show it working properly and changing the N into a space, however I seem to be stuck in segfault land :( Google isn't really being helpful in this case, and so I turn to you stackoverflow, any pointers (no pun intended!) would be appreciated

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
Mykel Stone
  • 311
  • 1
  • 2
  • 9

1 Answers1

5

I would assume it's because you're trying to access data that is in the .text section. Usually you're not allowed to write to code segment for security. Modifiable data should be in the .data section. (Or .bss if zero-initialized.)

For actual shellcode, where you don't want to use a separate section, see Segfault when writing to string allocated by db [assembly] for alternate workarounds.


Also I would never suggest using the side effects of call pushing the address after it to the stack to get a pointer to data following it, except for shellcode.

This is a common trick in shellcode (which must be position-independent); 32-bit mode needs a call to get EIP somehow. The call must have a backwards displacement to avoid 00 bytes in the machine code, so putting the call somewhere that creates a "return" address you specifically want saves an add or lea.

Even in 64-bit code where RIP-relative addressing is possible, jmp / call / pop is about as compact as jumping over the string for a RIP-relative LEA with a negative displacement.

Outside of the shellcode / constrained-machine-code use case, it's a terrible idea and you should just lea reg, [rel buf] like a normal person with the data in .data and the code in .text. (Or read-only data in .rodata.) This way you're not trying execute code next to data, or put data next to code.

(Code-injection vulnerabilities that allow shellcode already imply the existence of a page with write and exec permission, but normal processes from modern toolchains don't have any W+X pages unless you do something to make that happen. W^X is a good security feature for this reason, so normal toolchain security features / defaults must be defeated to test shellcode.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Sami Kuhmonen
  • 30,146
  • 9
  • 61
  • 74
  • He is not trying to write in a .text section memory address, he is trying in a memory address that it doesn't exist. He wanted the stack address in rsi, but he did a pop rsi, no mov rsi, rsp – sinkmanu Jul 10 '17 at 07:21
  • @sinkmanu Yes, they are. Read the code: call puts the address following itself to stack, then they pop it into rsi. They didn't want rsp in rsi. – Sami Kuhmonen Jul 10 '17 at 07:24
  • Yes, I didn't see that he didn't save the message in the .data section. That technique is called "jump call pop" – sinkmanu Jul 10 '17 at 07:30
  • I marking your reply as the answer since I've read plenty of statements to this effect. Additonally, if I do a mov rsi, 0x4000b2 it still segfaults, so I don't think it's an issue with rsi not pointing to the data. That begs the question of how the tutorial was compile though...since it obviously worked at some point in the past :/ – Mykel Stone Jul 10 '17 at 07:32
  • @MykelStone There are differences between platforms and versions so it may very well be that at the time of writing the 32bit Linux didn't by default block write access to the text segment whereas nowadays it's very common practise for security. Most likely the tutorial just is outdated. – Sami Kuhmonen Jul 10 '17 at 07:40
  • This appears to be the case, within the FAQ section he does mention disabling exec-shield and randomize_va_space within /proc/sys/kernel if you get segfaults. I unfortunately do not have those objects, nor would I really want to require that for my code to function. Thank you for taking the time to help out. – Mykel Stone Jul 10 '17 at 07:50
  • 2
    @MykelStone: i386 Linux historically had R+X text and R+X+W data, because until the NX bit (and AMD64), there was no way to make a page readable but not executable. There was always write-protection, though, and Linux used it for the text segment of executables. I'm pretty sure that building the author's code into a normal 32-bit executable and running it the normal way would always have segfaulted. – Peter Cordes Jul 10 '17 at 11:17
  • 1
    `call` to push a pointer to following data is something I've seen before in shellcode examples, but in shellcode exploits you don't have the luxury of putting your data in a different section. And the code + data usually only works if it's in a RWX page, typically on the stack. **TL:DR: some shellcode hacks don't work in normal executable that have read-only text segments**. Disabling `randomize_va_space` would only matter if you were injecting that into something. `call` to push the following address is PIC, and so is the rest of the code @MykelStone. – Peter Cordes Jul 10 '17 at 11:19
  • Thanks! That is some great information, I found out the tutorials are from ~2004 or so. You are correct this was in a shellcode example, I'm studying to become a penetration tester, and have a strong programming background; however my knowledge in things like this is lacking obviously! I've never had to pay attention since I've always coded within the lines so to speak. – Mykel Stone Jul 11 '17 at 16:59
  • For shellcode, [Segfault when writing to string allocated by db \[assembly\]](https://stackoverflow.com/q/25124591) is a good answer. – Peter Cordes Jan 28 '21 at 21:33
  • @Sami: I edited to explain why this jmp/call/pop sequence is a thing, replacing the question your answer posed about it with an explanation. Feel free to tweak / simplify / remove some / all of that, of course. – Peter Cordes Jan 29 '21 at 01:39