2

I want to write an assembly program that executes via EXECVE (syscall #0x3C) the program /bin/ls with the switches -al.

The man page (man 2 execve) states that the call requires three values:

int execve(const char *filename, char *const argv[], char *const envp[]);

I don't quite understand how to build the three arguments. As far as I know, the first argument goes into RDI, the second into RSI, and the third into RDX. I believe that to set up the first one, it suffices doing

    push 0x736c2f2f         ;sl//
    push 0x6e69622f         ;nib/
    mov rdi, rsp

For the third one, the thing is quite easy:

    xor r11, r11
    mov rdx, r11

My problem is that I don't know how to build the second argument, which should be an array containing ['/bin//ls', '-aal']

I need to write it for x86-64, so please no int 0x80 suggestions.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
neme666
  • 53
  • 7
  • Do you know C? You need to put your strings into memory somewhere and then put their addresses into the array, finally pass the address of the array to the syscall. – Jester Jan 20 '20 at 14:25
  • @Jester, although it's somewhat unconventional, It's not uncommon nor wrong to store strings on the stack. – h0r53 Jan 20 '20 at 14:30
  • I did not say anything about the stack. Stack is memory and will work fine. Anyway, another issue is that the `push imm32` still pushes 8 bytes so the string won't be correct, it will be `/bin<0><0><0><0>//ls` – Jester Jan 20 '20 at 14:32
  • 3
    @h0r53 There is no `push` instruction with an 8 byte immediate. You have to move into a register and then push. – fuz Jan 20 '20 at 14:44
  • @neme666. Are you trying to write a shellcode exploit for Linux on the `AMD64` architecture? – fpmurphy Jan 21 '20 at 11:05
  • @fpmurphy - Not really. I haven't got any problem to write the old hs//nib/ shellcode - and one could also object that having a shell would be more than enough for running an ls -al command, indeed! But I want to learn more. That's it. – neme666 Jan 21 '20 at 11:15
  • 1
    Wait, so you *don't* need your code to be a contiguous block that doesn't contain an `0` bytes (i.e. shellcode)? Then it's trivial and you should just put the strings in memory with their terminating `0` bytes, and use RIP-relative LEA to get pointers to them. Does it even have to be position-independent? If not, the arrays of pointers can be static as well instead of writing instructions to get addresses and store them to the stack. i.e. you can basically just use compiler output. Why would you waste your time pushing strings if you aren't aiming for code-injection (shellcode)? – Peter Cordes Jan 21 '20 at 21:42
  • e.g. if you didn't care about making this shellcode-compatible, you could do stuff like ```mov rcx, `/bin/ls\0``` to avoid needing another push to zero-terminate, even if you did still want to make it self-contained and constructed from immediates. – Peter Cordes Jan 22 '20 at 17:15
  • @PeterCordes - I want to write first the program, then transform it into a shellcode as an exercise. The thing is that I am used to write shellcode and sometimes I am a bit 'dirty'. I wouldn't be too formal and pedant on this, right now. It'd be better having a good explanation on why the below code works, because I have some problems anyway. – neme666 Jan 22 '20 at 17:22
  • Ok, so you do have shellcode in mind as an eventual target, and want to write it in ways that *will* work in shellcode. Therefore it should be tagged with `[shellcode]`. – Peter Cordes Jan 22 '20 at 17:28

3 Answers3

5

You can put the argv array onto the stack and load the address of it into rsi. The first member of argv is a pointer to the program name, so we can use the same address that we load into rdi.

xor edx, edx        ; Load NULL to be used both as the third
                    ; parameter to execve as well as
                    ; to push 0 onto the stack later.
push "-aal"         ; Put second argument string onto the stack.
mov rax, rsp        ; Load the address of the second argument.
mov rcx, "/bin//ls" ; Load the file name string
push rdx            ; and place a null character
push rcx            ; and the string onto the stack.
mov rdi, rsp        ; Load the address of "/bin//ls". This is
                    ; used as both the first member of argv
                    ; and as the first parameter to execve.

; Now create argv.
push rdx            ; argv must be terminated by a NULL pointer.
push rax            ; Second arg is a pointer to "-aal".
push rdi            ; First arg is a pointer to "/bin//ls"
mov rsi, rsp        ; Load the address of argv into the second
                    ; parameter to execve.

This also corrects a couple of other problems with the code in the question. It uses an 8-byte push for the file name, since x86-64 doesn't support 4-byte push, and it makes sure that the file name has a null terminator.

This code does use a 64-bit push with a 4-byte immediate to push "-aal" since the string fits in 4 bytes. This also makes it null terminated without needing a null byte in the code.

I used strings with doubled characters as they are in the question to avoid null bytes in the code, but my preference would be this:

mov ecx, "X-al"     ; Load second argument string,
shr ecx, 8          ; shift out the dummy character,
push rcx            ; and write the string to the stack.
mov rax, rsp        ; Load the address of the second argument.
mov rcx, "X/bin/ls" ; Load file name string,
shr rcx, 8          ; shift out the dummy character,
push rcx            ; and write the string onto the stack.

Note that the file name string gets a null terminator via the shift, avoiding the extra push. This pattern works with strings where a doubled character wouldn't work, and it can be used with shorter strings, too.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
prl
  • 11,716
  • 2
  • 13
  • 31
  • 3
    Please provide some context or commentary to this code to make it more useful. – miken32 Jan 20 '20 at 20:05
  • See "[Explaining entirely code-based answers](https://meta.stackoverflow.com/q/392712/128421)". While this might be technically correct it doesn't explain why it solves the problem or should be the selected answer. We should educate in addition to help solve the problem. – the Tin Man Jan 20 '20 at 21:56
  • Yup, that's much better. I'd have use `push "-aal"` to make that instruction self-documenting because this looks like NASM syntax. – Peter Cordes Jan 20 '20 at 23:24
  • Yes, it's better, though don't add "edited" or "updated" or "addendum". The goal of SO is to create something like an online reference book, not a message board or forum. Readability is much more important than tagging changes as we can see what's changed if we need to. "[Should “Edit:” in edits be discouraged?](https://meta.stackoverflow.com/q/255644/128421)" – the Tin Man Jan 21 '20 at 00:38
  • @theTinMan, I put "addendum" because it was additional information not directly related to answering the question, not because it was an edit. Do you think that should not be indicated in some way? – prl Jan 21 '20 at 01:03
  • It's not necessary to do that. "Note that the file..." works nicely. – the Tin Man Jan 21 '20 at 02:23
  • @prl: I understood completely the mechanism you used. It works fine. – neme666 Jan 21 '20 at 11:41
1

You can write push '/bin' in NASM to get the bytes into memory in that order. (Padded with 4 bytes of zeros, for a total width of qword; dword pushes are impossible in 64-bit mode.) No need to mess around with manually encoding ASCII characters; unlike some assemblers NASM doesn't suck at multi-character literals and can make your life easier.

You could use use mov dword [rsp+4], '//ls' to store the high half. (Or make it a qword store to write another 4 bytes of zeroes past that, with a mov r/m64, sign_extended_imm32.) Or just zero-terminate it with an earlier push before doing mov rsi, '/bin//ls' / push rsi if you want to store exactly 8 bytes.

Or mov eax, '//ls' ; shr eax, 8 to get EAX="/ls\0" in a register ready to store to make an 8-byte 0-terminated string.

Or use the same trick of shifting out a byte after mov r64, imm64 (like in @prl's answer) instead of separate push / mov. Or NOT your literal data so you do mov rax, imm64 / not rax / push rax, producing zeros in your register without zeros in the machine code. For example:

 mov  rsi, ~`/bin/ls\0`   ; mov rsi, 0xff8c93d091969dd0
 not  rsi
 push rsi                 ; RSP points to  "/bin/ls", 0

If you want to leave the trailing byte implicit, instead of an explicit \0, you can write mov rsi, ~'/bin/ls' which assembles to the same mov rsi, 0xff8c93d091969dd0. Backticks in NASM syntax process C-style escape sequences, unlike single or double quotes. I'd recommend using \0 to remind yourself why you're going to the trouble of using this NOT, and the ~ bitwise-negation assemble-time operator. (In NASM, multi-character literals work as integer constants.)



I believe that to set up the first one, it suffices doing

  push 0x736c2f2f         ;sl//
  push 0x6e69622f         ;nib/
  mov rdi, rsp

No, push 0x736c2f2f is an 8-byte push, of that value sign-extended to 64-bit. So you've pushed '/bin\0\0\0\0//ls\0\0\0\0'.

Probably you copied that from 32-bit code where push 0x736c2f2f is a 4-byte push, but 64-bit code is different.

x86-64 can't encode a 4-byte push, only 2 or 8 byte operand-size. The standard technique is to push 8 bytes at a time:

  mov   rdi, '/bin//ls'     ; 10-byte mov r64, imm64
  push  rdi
  mov   rdi, rsp   

If you have an odd number of 4-byte chunks, the first one can be push imm32, then use 8-byte pairs. If it's not a multiple of 4, and you can't pad with redundant characters like /, mov dword [mem], imm32 that partially overlaps might help, or put a value in a register and shift to introduce a zero byte.

See

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
-1

Load the following C example (modify if needed) into the Godbolt compiler explorer and you can see how various compilers typically generate assembly for a call to execve on the AMD64 (or other) architecture.

#include <stdio.h>
#include <unistd.h>

int 
main(void) {
   char* argv[] = { "/bin/ls", "-al", NULL };
   // char* argv[] = { "-al", NULL };
   // char* argv[] = { "/bin/lsxxx", "-al", NULL };
   // char* argv[] = { "", "-al", NULL };
   char* envp[] = { "PATH=/bin", NULL };

   if (execve("/bin/ls", argv, envp) == -1) {
      perror("Could not execve");
      return 1;
   }  
}
fpmurphy
  • 2,464
  • 1
  • 18
  • 22
  • That's basically useless for shellcode; gcc will just store pointers to string literals in the `.rodata` section. Without `-fPIE` you'll get `mov [mem64], imm32` with an absolute address: useless for shellcode. With `-fPIE` you'll get a RIP-relative LEA into a register which only works in shellcode if the string is before the code (so the `rel32` doesn't contain any `00` bytes). – Peter Cordes Jan 20 '20 at 19:56
  • Actually with multiple strings this just doesn't work at all. You need each of them 0-terminated, so you can't just take the compilers rodata and jump over it because it has to contain zeros. – Peter Cordes Jan 20 '20 at 23:28
  • @prl. Try compiling and running all 4 versions of `argv[]` in my example. On, Linux, `argv[0] `can actually be anything, including an empty string. Not saying that this 'feature' is good or bad but it is so. – fpmurphy Jan 21 '20 at 04:57
  • @PeterCordes. Of course it is useless for shellcode! However, where did the OP say they were dealing with shellcode? – fpmurphy Jan 21 '20 at 05:02
  • It's tagged with [tag:shellcode], otherwise sure, maybe you could guess that the attempt at pushing parts of strings was just something they happened to try. Regardless, removed my downvote now that the answer is at least useful for non-shellcode usage. Of course, it will compile to a library function call, not an inline `syscall`, but fortunately for x86-64 Sys V the system calling convention matches the user-space function-calling convention except for RCX so you can replace `call` with `syscall` directly here. – Peter Cordes Jan 21 '20 at 05:10