0

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targetting.

For example, given the following in anon.c:

int give3() {
    return 3;
}

I can run

gcc anon.c -o anon.obj -c
objdump -D anon.obj

which gives me (on MinGW):

anon1.obj:     file format pe-i386


Disassembly of section .text:

00000000 <_give3>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   b8 03 00 00 00          mov    $0x3,%eax
   8:   5d                      pop    %ebp
   9:   c3                      ret    
   a:   90                      nop
   b:   90                      nop

So I can make main like this:

main.c

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    uint8_t shellcode[] = {
        0x55,
        0x89, 0xe5,
        0xb8, 0x03, 0x00, 0x00, 0x00,
        0x5d, 0xc3,
        0x90,
        0x90
    };

    int (*p_give3)() = (int (*)())shellcode;
    printf("%d.\n", (*p_give3)());
}

My question is, is it practical to automate the process of converting the self contained anonymous function that does not refer to anything that is not within its scope or in arguments?

eg:

#include <stdio.h>
#include <stdint.h>

int main(int argc, char **argv)
{
    uint8_t shellcode[] = [@[
        int anonymous() {
            return 3;
        }
    ]];

    int (*p_give3)() = (int (*)())shellcode;
    printf("%d.\n", (*p_give3)());
}

Which would compile the text into shellcode, and place it into the buffer?

The reason I ask is because I really like writing C, but making pthreads, callbacks is incredibly painful; and as soon as you go one step above C to get the notion of "lambdas", you lose your language's ABI(eg, C++ has lambda, but everything you do in C++ is suddenly implementation dependent), and "Lisplike" scripting addons(eg plug in Lisp, Perl, JavaScript/V8, any other runtime that already knows how to generalize callbacks) make callbacks very easy, but also much more expensive than tossing shellcode around.

If this is practical, then it is possible to put functions which are only called once into the body of the function calling it, thus reducing global scope pollution. It also means that you do not need to generate the shellcode manually for each system you are targetting, since each system's C compiler already knows how to turn self contained C into assembly, so why should you do it for it, and ruin readability of your own code with a bunch of binary blobs.

So the question is: is this practical(for functions which are perfectly self contained, eg even if they want to call puts, puts has to be given as an argument or inside a hash table/struct in an argument)? Or is there some issue preventing this from being practical?

Cactus
  • 27,075
  • 9
  • 69
  • 149
Dmytro
  • 5,068
  • 4
  • 39
  • 50
  • 4
    I fail to see how introducing undefined behavior is preferable to simply writing a named function. – EOF Nov 11 '16 at 18:15
  • 1
    because functions called only once inside the code should not pollute global scope. Imagine a callback heavy program; with this you can easily create 5 callback pthread without any global pollution. And what makes this cause undefined behavior? – Dmytro Nov 11 '16 at 18:16
  • GCC has a [nested functions](https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html) extension, but I don't know if they work with (p)threads, though I doubt it. – Kninnug Nov 11 '16 at 18:17
  • 5
    You are aware that you can declare a function as `static` so it does *not* pollute global scope? – EOF Nov 11 '16 at 18:17
  • it still pollutes the scope of the c code file. If I want to have a pthread that takes a function that spawns a pthread with that takes a function that spawns a pthread and so on, it means you need multiple named functions whose names are either not meaningful or very difficult to come up with. With this approach, it becomes very easy to create small self contained async tasks. static is a great for helper functions called multiple times, but horrible for callback functions only called once, and quickly make your code difficult to navigate. This way, callbacks are grouped by their caller. – Dmytro Nov 11 '16 at 18:32
  • Switching over to c++ might be an option – Jabberwocky Nov 11 '16 at 18:37
  • @MichaelWalz C++ lambdas do things very well, but once you commit to C++ you lose your ABI(until C++ standardizes an ABI), and it becomes difficult for other C programs(or assembly) to talk to you without forcing them to be compiled with the C++ compiler. – Dmytro Nov 11 '16 at 18:38
  • @Dmitry I can't fully understand what you are looking for conceptually and practically. Do you want to fork gcc? – Margaret Bloom Nov 11 '16 at 18:40
  • I was thinking more along the line of preprocessing C with php/javascript/perl to automate the shellcode embedding, I am not sure I have the skills to navigate gcc project structure; I have bad experience trying to navigate c projects, and getting them to compile. I am curious if such preprocessing is practical or if there is something that makes it impossible to do this on some machines? I know that on my machine I can create arbitrarily nested shellcode and it works fine as long as each shellcode is self contained. – Dmytro Nov 11 '16 at 18:40
  • Embedding code on the stack, or in data in general, does not seem like a terribly good idea (and won't work on processors with an 'NX' bit set on the data sections). GCC has extensions that let you create anonymous functions using a combination of nested functions and statement expressions - see http://stackoverflow.com/questions/10405436/anonymous-functions-using-gcc-statement-expressions – Ian Abbott Nov 11 '16 at 18:49
  • You could also use clang's blocks for this purpose. – chqrlie Nov 11 '16 at 18:50
  • 3
    You're badly misusing the term "shellcode" here. Shellcode is specially crafted machine code that's used to exploit a vulnerability. What you're talking about is simply machine code. And as EOF pointed out what you're trying to do is pointless, you can just use a static function. The fact that this would "pollute" the file scope name space is an insignificant problem compared to the horrible unmaintainable mess the alternative you're proposing would create. – Ross Ridge Nov 11 '16 at 19:00
  • @RossRidge static functions are not anonymous, and cannot be tossed around as freely as shellcode(yes it doesn't spawn a shell but the procedure it was obtained from is the same one that creates shellcode). Also as noted earlier, static functions only called once make more sense to be embedded into their caller. you don't want 500 functions in file scope named cb1, cb2, since it forces the reader to keep jumping around the source to follow the flow – Dmytro Nov 11 '16 at 19:02
  • Why are you making code very hard to read and maintain? I think writing good code should have these factors in mind because 1) You may understand it but others have to. 2) One year down the line - will you understand it easily. 3) Is this portable and easy debuggable? – Ed Heal Nov 11 '16 at 19:24
  • @EdHeal any modern c programmer must understand callbacks. I imagine most should prefer them being inside their caller instead of in windows api callback notation, which is much harder to maintain. without callbacks you don't scale well. JavaScript will be faster. – Dmytro Nov 11 '16 at 19:25
  • But `shellcode` is just numbers that need to be understood. Here lies the problem. Come back in a few months time to have to work out what it means. – Ed Heal Nov 11 '16 at 19:28
  • @EdHeal everything is a "number"(symbol) that needs to be understood; that's how we build systems, from algebras and calculi to pointers, floating points, and vtables, everything depends on binding of symbols. I don't see your point. My question is about whether this is practical or not, and what implementation problems can come up along the way. Im not saying Its good or bad, but whether it can be another tool we could have(like gotos). – Dmytro Nov 11 '16 at 19:33
  • My point it that in the code it is just a series of numbers - that happen to be assembly language. You come back to it in a few months time and you will need to understand what those numbers mean. The whole point of programming languages or assembly language for that matter to have it in a more usable and readable human form. What is the point of trying to make a language doing something that it was not designed for? What is the point of writing obfuscated code? Just makes life harder for others to understand, debug and maintain. – Ed Heal Nov 11 '16 at 19:37
  • @EdHeal This is not about writing obfuscated code, it's about writing clear code that the compiler automates transformation into code blocks that are self contained and compressible and easy to use for threads and callbacks without resorting to nasty functions that only get created for the sake of being passed as a pointer argument to pthread or another function add a callback function. The code remains readable, and you don't have to write global WndProcs, it's a win on readability and scalability and maintainability. – Dmytro Nov 11 '16 at 20:16
  • Why would you want to take the un-optimized compiler output, even including the NOP padding, and hard-code that into an array of bytes of machine code? Your horrible implementation of what you're mis-labelling as "shellcode" distracts from your whole example. It also totally prevents the compiler from inlining the "lambda" into the function you're passing the callback to, e.g. with link-time optimization (unless it's to a library function like `qsort`). In your example, you're using the function pointer yourself, and only passing its result to printf. But in a way that can't optimize. – Peter Cordes Nov 11 '16 at 21:17
  • my example is just to express intent, it's not meant to be optimal. That said, the point of this is not to inline because inlining glues the lambda to where it is declared. I want it to be freely tossed around and independent lambda, so i can't imagine the compiler can optimize it differently based on where it is since it is meant to only depend on the arguments, stack data, and registers, with which of course it can allocate resources but that is done by passing function pointers as arguments. – Dmytro Nov 11 '16 at 23:22
  • You're right that the nop padding is silly and I didn't compile it with optimization flags, but that's just negligence on my part, I wanted to have this posted before leaving for university. But after full block optimization it can't be optimized because the block must be completely independent and composable; this means it must be more expensive than a static block because it is meant to live at runtime fully intact, whereas static inlined blocks get much of their body tossed out. – Dmytro Nov 11 '16 at 23:31
  • Basically, my version is meant to preserve structure at runtime whereas c++ lambdas are meant to disappear at runtime, I could be completely wrong though. – Dmytro Nov 11 '16 at 23:40
  • The alternative is a `static` function that just needs a name that's unique within that file. That's a tiny cost, and nowhere near worth avoiding with a solution like this. Putting machine code in an array manually just seems like WAY too high a cost in portability (even to different ABIs on the same architecture), as well as in cases where link-time optimization could do something (which I agree is rare for callback functions). Especially when you take into account all the difficulty of making sure the machine code is in an executable page. – Peter Cordes Nov 16 '16 at 05:57

3 Answers3

5

Apple has implemented a very similar feature in clang, where it's called "blocks". Here's a sample:

int main(int argc, char **argv)
{
    int (^blk_give3)(void) = ^(void) {
        return 3;
    };

    printf("%d.\n", blk_give3());

    return 0;
}

More information:

  • that's really interesting, looks a lot more lexical practical than casting to function pointers. Can these be used in threads/passed as arguments to functions? – Dmytro Nov 11 '16 at 19:20
  • 1
    Yes! In fact, Apple uses blocks extensively to allow applications to pass callbacks to macOS and iOS APIs. –  Nov 11 '16 at 19:22
  • This extension seems to be one step forward, two steps back compared to GCC's nested functions. It allows anonymous functions, but doesn't allow full access to the nested scope and doesn't use normal function pointers and so can't be used with callbacks that use normal function pointers. – Ross Ridge Nov 11 '16 at 19:43
  • @RossRidge On the other hand, nested functions can't be called after the function that defined them has exited. Blocks can be; this is a huge advantage. –  Nov 11 '16 at 19:48
  • my goal of nested blocks is exactly that, to make the lambda disappear after it is no longer needed. Static functions tend to stick around forever. This partly means wasted memory, but it also makes it hard to find who calls the anonymous function since that function is completely outside the address range of the caller's instructions, whereas my version is always contained within the caller. I imagine cache locality is also improved since the callback is already there. – Dmytro Nov 11 '16 at 23:24
  • @Dmitry I think you're confused. Storing code in a local variable on the stack does not improve cache locality. It actually makes it _worse_, as the stack is nowhere near the code segment. (It also means you need an executable stack, which is really bad from a security perspective.) Blocks are compiled as functions, and do not "stick around" anywhere unless they're explicitly copied. –  Nov 11 '16 at 23:56
  • @dustwuff you may be right about the cache locality; It also seems that this may be impractical for asynchronous callbacks since the pointer to the block will get invalidated as soon as the function containing it returns(like any other pointer to local variable), you can use structs and pass by value to avoid this, but it's a bit more tricky than it first seemed(although the size of the block is known so passing by value still seems practical). The block can also be moved to the heap before being passed, and use ref counting but then you have two problems. – Dmytro Nov 12 '16 at 00:59
  • It still seems practical for synchronous callbacks though, since the block is not invalidated if the called function remains while the callee executes, allowing the callee to depend on the caller's stack pointer. That said, I'm depending on pointers to local variables, which in C is frowned upon, it seems okay in this case but I can imagine it causing bugs I'm unaware of. – Dmytro Nov 12 '16 at 01:01
  • 1
    @Dmitry I think you haven't read all the documentation! :) Blocks are perfectly viable for async callbacks -- they can be copied using `Block_copy()`, which allows them to be used outside the defining function. The code itself isn't copied, just a reference to it and any `__block` variables it closes over. –  Nov 12 '16 at 01:03
  • Im getting a undefined reference to _NSConcreteGlobalBlock error when compiling it with `clang clang-blocks.c -fblocks`. It's an Ubuntu environment. https://github.com/dmitrymakhnin/Experiments/blob/master/clang-blocks/main.c – Dmytro Nov 12 '16 at 01:44
4

I know that C compilers are capable of taking standalone code, and generate standalone shellcode out of it for the specific system they are targeting.

Turning source into machine code is what compilation is. Shellcode is machine code with specific constraints, none of which apply to this use-case. You just want ordinary machine code like compilers generate when they compile functions normally.

AFAICT, what you want is exactly what you get from static foo(int x){ ...; }, and then passing foo as a function pointer. i.e. a block of machine code with a label attached, in the code section of your executable.

Jumping through hoops to get compiler-generated machine code into an array is not even close to worth the portability downsides (esp. in terms of making sure the array is in executable memory).


It seems the only thing you're trying to avoid is having a separately-defined function with its own name. That's an incredibly small benefit that doesn't come close to justifying doing anything like you're suggesting in the question. AFAIK, there's no good way to achieve it in ISO C11, but:

Some compilers support nested functions as a GNU extension:

This compiles (with gcc6.2). On Godbolt, I used -xc to compile it as C, not C++.. It also compiles with ICC17, but not clang3.9.

#include <stdlib.h>

void sort_integers(int *arr, size_t len)
{
  int bar(){return 3;}  // gcc warning: ISO C forbids nested functions [-Wpedantic]

  int cmp(const void *va, const void *vb) {
    const int *a=va, *b=vb;       // taking const int* args directly gives a warning, which we could silence with a cast
    return *a > *b;
  }

  qsort(arr, len, sizeof(int), cmp);
}

The asm output is:

cmp.2286:
    mov     eax, DWORD PTR [rsi]
    cmp     DWORD PTR [rdi], eax
    setg    al
    movzx   eax, al
    ret
sort_integers:
    mov     ecx, OFFSET FLAT:cmp.2286
    mov     edx, 4
    jmp     qsort

Notice that no definition for bar() was emitted, because it's unused.

Programs with nested functions built without optimization will have executable stacks. (For reasons explained below). So if you use this, make sure you use optimization if you care about security.


BTW, nested functions can even access variable in their parent (like lambas). Changing cmp into a function that does return len results in this highly surprising asm:

__attribute__((noinline)) 
void call_callback(int (*cb)()) {
  cb();
}

void foo(int *arr, size_t len) {
  int access_parent() { return len; }
  call_callback(access_parent);
}

## gcc5.4
access_parent.2450:
    mov     rax, QWORD PTR [r10]
    ret
call_callback:
    xor     eax, eax
    jmp     rdi
foo:
    sub     rsp, 40
    mov     eax, -17599
    mov     edx, -17847
    lea     rdi, [rsp+8]
    mov     WORD PTR [rsp+8], ax
    mov     eax, OFFSET FLAT:access_parent.2450
    mov     QWORD PTR [rsp], rsi
    mov     QWORD PTR [rdi+8], rsp
    mov     DWORD PTR [rdi+2], eax
    mov     WORD PTR [rdi+6], dx
    mov     DWORD PTR [rdi+16], -1864106167
    call    call_callback
    add     rsp, 40
    ret

I just figured out what this mess is about while single-stepping it: Those MOV-immediate instructions are writing machine-code for a trampoline function to the stack, and passing that as the actual callback.

gcc must ensure that the ELF metadata in the final binary tells the OS that the process needs an executable stack (note readelf -l shows GNU_STACK with RWE permissions). So nested functions that access outside their scope prevent the whole process from having the security benefits of NX stacks. (With optimization disabled, this still affects programs that use nested functions that don't access stuff from outer scopes, but with optimization enabled gcc realizes that it doesn't need the trampoline.)

The trampoline (from gcc5.2 -O0 on my desktop) is:

   0x00007fffffffd714:  41 bb 80 05 40 00       mov    r11d,0x400580   # address of access_parent.2450
   0x00007fffffffd71a:  49 ba 10 d7 ff ff ff 7f 00 00   movabs r10,0x7fffffffd710   # address of `len` in the parent stack frame
   0x00007fffffffd724:  49 ff e3        rex.WB jmp r11 
    # This can't be a normal rel32 jmp, and indirect is the only way to get an absolute near jump in x86-64.

   0x00007fffffffd727:  90      nop
   0x00007fffffffd728:  00 00   add    BYTE PTR [rax],al
   ...

(trampoline might not be the right terminology for this wrapper function; I'm not sure.)

This finally makes sense, because r10 is normally clobbered without saving by functions. There's no register that foo could set that would be guaranteed to still have that value when the callback is eventually called.

The x86-64 SysV ABI says that r10 is the "static chain pointer", but C/C++ don't use that. (Which is why r10 is treated like r11, as a pure scratch register).

Obviously a nested function that accesses variables in the outer scope can't be called after the outer function returns. e.g. if call_callback held onto the pointer for future use from other callers, you would get bogus results. When the nested function doesn't do that, gcc doesn't do the trampoline thing, so the function works just like a separately-defined function, so it would be a function pointer you could pass around arbitrarily.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • it seems sufficient for most use cases but I still get this empty feeling I get when writing python code and I want an iife, I guess if I really need that, I can resort to addons that just automate name generation, or call an embedded runtime that supports iife like javascript, perl, or lisp, it has a lot of overhead of runtime compilation but if you already have it, it can at times be justified, and it is as leaky, but more pricy. – Dmytro Nov 16 '16 at 07:14
  • @Dmitry: What does an [iife](https://en.wikipedia.org/wiki/Immediately-invoked_function_expression) have to do with anonymous (or at least function-scoped) callback functions? Your idea of extracting machine code is the opposite of what you should do if you're going to invoke it in the same place it's defined! Because if you do execute it right away, then you want it to inline. – Peter Cordes Nov 16 '16 at 07:33
  • yeah you're right about that, iife blocks may as well just be anonymous blocks c already supports via {}. I haven't thought that through. My bad, well except you can't return from them into an assignment, but if it's only used once, a regular static function is almost surely going to be inlined, so the abstraction is non leaky – Dmytro Nov 16 '16 at 07:40
  • This Compiler Explorer you're using is pretty fancy, Ill need to play with it; normally I just harass objdump directly. – Dmytro Nov 16 '16 at 07:57
  • 1
    Related: [Implementation of nested functions](https://stackoverflow.com/q/8179521) is another answer about how GCC makes trampolines. – Peter Cordes Oct 17 '22 at 10:45
1

It seems possible, but unnecessarliy complicated:

shellcode.c

 int anon() { return 3; }

main.c

 ...
 uint8_t shellcode[] = {
 #include anon.shell
};

int (*p_give3)() = (int (*)())shellcode;
printf("%d.\n", (*p_give3)());   

makefile:

anon.shell:
   gcc anon.c -o anon.obj -c; objdump -D anon.obj | extractShellBytes.py anon.shell

Where extractShellBytes.py is a script you write which prints only the raw comma-separated code bytes from the objdump output.

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • 1
    Exercise left to the reader. Use any language of your choice. – AShelly Nov 11 '16 at 23:41
  • I already did this in Perl once. I was curious if you actually had a proof of concept. – Dmytro Nov 11 '16 at 23:43
  • And how to you portably make sure this goes into an executable page? (Rather than a data page without execute permission.) If you put that inside a function, gcc in my experience would emit code to write the bytes to the stack. Non-executable stack memory is a common security practice in most OSes. `static const uint8_t` would help significantly, since read-only data typically goes into a section (.rodata) which becomes part of the text segment, so it's mapped the same as code. But who knows if there are C implementations where even that fails? – Peter Cordes Nov 16 '16 at 06:00
  • Anyway, yes you can somewhat automate the process, but doing it this way means your callback has to be defined in a separate file, unless your filter takes a function-name as an arg or something. I don't see how that's better than just having a `static` function in the same file as the function that wants to pass it as a callback, which is what this whole ridiculous idea is trying to avoid. – Peter Cordes Nov 16 '16 at 06:06
  • @PeterCordes interesting point about making sure the page is executable. I am used to stack code being executable, but this does cause very obvious well known security problems, which would make the lambda raise a system level exception on attempt to jump to it. static mangled lambdas are cheap but their abstraction is too leaky for my taste, since they behave in a very non obvious way in contrast to how i want them to be defined inside the function body(it creates a non obvious static dependency, and the lambda is stored in a different place than the abstraction seems to make it seem). – Dmytro Nov 16 '16 at 06:19
  • my idea was compelling because it was not leaky; the compiler knows how to precompile self contained blocks for the current platform. It seems viable for kernel mode processes to use, but relying on stack code to be executable might make it not practical for user mode processes, static callbacks and lisplike runtimes(passing intents rather than code blocks) may indeed be the best options. – Dmytro Nov 16 '16 at 06:28