1

So I recently asked this question

I had to create an environment variable MYENV and store something in it such that I can successfully run this code.

#include <stdio.h>
#include <stdlib.h>

int main(){
            int (*func)();
            func = getenv("MYENV");
            func();
}

Earlier I was doing something like export MYENV=ls.

Which a user pointed out is incorrect as when the func() is called it basically tells C to run the instructions stored in the variable func which would be the string ls and is not a correct machine code. So I should pass some shellcode instead.

Now I want to know if this how it works for functions in general. As in when I declare a function let's say myFunction() which does let's say multiply 100 and 99 and returns the value, then the variable myFunction will point towards a set of machine instructions stored somewhere which multiplies 100 and 99 and returns the value.

And if I were to figure out those machine instructions and store them in a string and make myFunction point towards it, and then if I call myFunction() we'll have 9900 returned?

This is what I mean :

int (*myFunc)();
char *var = <machine_instructions_in_string_format>
int returnVar = myFunc();

Will the returnVar have 9900?

And if yes, how do I figure out what that string is?

I am having a hard time wrapping my head around this.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
aroma
  • 1,370
  • 1
  • 15
  • 30
  • 1
    Not in general. It might work on some platforms, however, but it is UB according to the C standard. On common platforms you have to make at least the page executable, that contains the code (i.e. with `mprotect()` on unixish systems). – Ctx Jun 08 '20 at 16:35
  • 1
    You "figure out what the string is" by compiling a program that does what you want, then looking at the machine code that was generated. – Barmar Jun 08 '20 at 16:37
  • @Barmar: correction: compiling a *function*, not a whole program. e.g. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) / [How to disassemble one single function using objdump?](https://stackoverflow.com/q/22769246) – Peter Cordes Jun 08 '20 at 16:47
  • `ls` is not a function, but a *command*. `qsort` is a standard C function – Basile Starynkevitch Jun 08 '20 at 16:50
  • compiling a function using _position independent code_ – Ctx Jun 08 '20 at 16:50
  • Related: [Buffer overflow using environment variables](https://stackoverflow.com/q/36885127) shows executing machine code by copying it into a buffer. `gcc -z execstack` gives all pages exec permission, otherwise you can use `mprotect` or `mmap`. [How to execute x86 commands from data buffer?](https://stackoverflow.com/q/20028892) (Modern exploits are typically ROP attacks that inject return addresses to existing code, not actual code injection. This is why PIE executables that can randomize *all* code+data are good.) – Peter Cordes Jun 08 '20 at 16:53
  • It is probably not what you want, but in principle it is possible (but not defined by the language) to store instructions in a character array and call them, see https://stackoverflow.com/a/39868486/3150802. – Peter - Reinstate Monica Jun 08 '20 at 16:56
  • Or even more related: [Exactly what cases does the gcc execstack flag allow and how does it enforce it?](https://stackoverflow.com/q/53346274) discusses what's going on with assigning `getenv`'s return value to a function pointer. – Peter Cordes Jun 08 '20 at 16:56
  • 3
    As a more general remark, compiled languages typically make a strong distinction between code (in C: functions) and data (in C: variables, including arrays). On modern systems a program cannot modify itself, even though that's pretty cool; and it cannot execute data (unless you jump through the hoops in the links above). Both was easier possible in, say, the 1970s and on occasion put to good use. But in general you would use interpreted languages for that, some of which (Lisp) do not make that distinction at all. – Peter - Reinstate Monica Jun 08 '20 at 17:03

1 Answers1

4

You have to fill the environment variable out with opcodes for your target machine. I made a little experiment:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
        int (*f)();
        f = getenv("VIRUS");
        (*f)();
        printf("Haha, it returned\n");
        return 0;
}

I compiled it, then used execstack:

$ cc ge.c
$ execstack -s ./a.out

Then I wrote a bit of assembler:

mov %rbp, %rsp
pop %rbp
ret

Which mimics the function epilogue. Compiled it:

$ cc -c t.s

Looked at the opcodes:

$ objdump -D t.o
...
   0:   48 89 ec                mov    %rbp,%rsp
   3:   5d                      pop    %rbp
   4:   c3                      retq   

set the envar:

$ export VIRUS=$(printf "\\x48\\x89\\xec\\x5d\\xc3")

then ran the program:

$ ./a.out

And it said nothing, which is a clear indication that the printf line was stepped over. But, just to check, I tried:

$ export VIRUS=$(printf "\\xc3")
$ ./a.out
Haha, it returned

This was run on ubuntu-18.04 with an amd64 instruction set. If this happens to be a school assignment, you should aim for bonus points and figure out how you could get it to execute an opcode that contained a null (0) byte.

mevets
  • 10,070
  • 1
  • 21
  • 33
  • That asm definitely deserves comments. You're tearing down the *caller's* stack frame because that function didn't `push %rbp` / `mov %rsp, %rbp` to make its own stack frame. So the `ret` is popping *main's* return address into RIP. It's not exactly "stepping over" `printf`, it's more like a longjmp. But of course it depends on debug-mode code-gen by the compiler! – Peter Cordes Jun 08 '20 at 18:11
  • It will happen to return zero because you declared the function pointer's arg type as `()` not `(void)`, so the compiler will zero AL via zeroing EAX. (And the process exit status only captures the low byte of the retval anyway.) – Peter Cordes Jun 08 '20 at 18:12
  • Also, an easier way to build this is `gcc -zexecstack ge.c`, to pass the execstack option to the linker instead of modifying the binary afterward. But yes, either way [it sets a read-implies-exec flag in the ELF metadata](https://stackoverflow.com/q/53346274), making all pages executable including but not limited to the region above the initial stack pointer where env vars live. – Peter Cordes Jun 08 '20 at 18:14
  • This is probably a homework assignment. Giving too much away.... – mevets Jun 08 '20 at 18:54
  • The OP's earlier question said something about a "beginners ctf event". I don't think that implies academic credit is at stake, just Internet nerd points. Unless I'm mistaken. – Peter Cordes Jun 08 '20 at 18:59
  • 1
    I sort of meld 'arbitrary constraint' == assignment – mevets Jun 08 '20 at 19:06