8

John Viega suggests a method to obfuscate function calls in his book Secure Programming Cookbook for C and C++. It can be read here.

#define SET_FN_PTR(func, num)                  \
    static inline void *get_##func(void) { \
      int  i, j = num / 4;                 \
      long ptr = (long)func + num;         \
      for (i = 0;  i < 2;  i++) ptr -= j;  \
      return (void *)(ptr - (j * 2));      \
    }
#define GET_FN_PTR(func) get_##func(  )

#include <stdio.h>

void my_func(void) {
  printf("my_func(  ) called!\n");
}

SET_FN_PTR(my_func, 0x01301100); /* 0x01301100 is some arbitrary value */

int main(int argc, char *argv[  ]) {
  void (*ptr)(void);

  ptr = GET_FN_PTR(my_func);     /* get the real address of the function */
  (*ptr)(  );                      /* make the function call */
return 0;

}

I compiled it with gcc fp.c -S -O2, Ubuntu 15.10 64bit, gcc5.2.1, and checked the assemby:

...
my_func:
.LFB23:
        .cfi_startproc
        movl    $.LC0, %edi
        jmp     puts
        .cfi_endproc
.LFE23:
        .size   my_func, .-my_func
        .section        .text.unlikely
.LCOLDE1:
        .text
.LHOTE1:
        .section        .text.unlikely
.LCOLDB2:
        .section        .text.startup,"ax",@progbits
.LHOTB2:
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB25:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    my_func
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
...

I see that my_func is called in main. Can somebody explain how this method obfuscates the function call?

I see that many readers just come and downvote. I took the time the understand the problem, and when I failed to post it here. Please at least write some comment, instead of pushing the downvote button.

UPDATE: Turning off optimization I got:

...
my_func:
...
get_my_func:
...
main:
...
    call    get_my_func
    movq    %rax, -8(%rbp)
    movq    -8(%rbp), %rax
    call    *%rax
...

I think there is no inlineing now. However I do not really understand why is it important...

I am still looking for an explanation what was the goal of the author with this code, even if it not working with today's smart compilers.

robert
  • 3,539
  • 3
  • 35
  • 56
  • Dear downvoter, I would appreciate your comment – robert Nov 03 '15 at 14:14
  • 7
    SO is to help people write **better** code, not worse. – too honest for this site Nov 03 '15 at 14:14
  • 4
    @Olaf I am trying to protect a commercial software. – robert Nov 03 '15 at 14:15
  • And you are going to provide source code? – too honest for this site Nov 03 '15 at 14:16
  • 3
    I agree that this is not even remotely good programming practice. – Erik Nov 03 '15 at 14:18
  • 2
    @Olaf What kind of source code? The above code snippet is taken from a book. It is in the question... – robert Nov 03 '15 at 14:18
  • Interesting. That's not at all the assembly output I got when I compiled it with `gcc 4.7.2`. Are you sure that's the right assembly output? What I get is a `call get_my_func`. What compiler and platform are you using? Is that assembly output a direct result of compiling just the code you show? – lurker Nov 03 '15 at 14:23
  • @lurker I updated the question, gcc5.2.1. Same output as before. – robert Nov 03 '15 at 14:31
  • The `gcc 5.2.1` compiler seems to be smart enough to realize that your inline function is mapping `get_my_func` to `my_func`. If that's the case, then the obfuscation's going to have to be more elaborate. You can do a `cpp foo.c` on your function to see what the preprocessor output looks like, to get a real view of what the compiler is compiling. – lurker Nov 03 '15 at 14:34
  • Might it be that the optimizer of the compiler optimizes the obfuscation away? – gmug Nov 03 '15 at 14:37
  • @gmug You are right. Without optimization the main calls get_my_func and later *%rax. But now I think there is no inlineing. – robert Nov 03 '15 at 14:42
  • 6
    What I don't understand is how this could really make much difference. If someone is using a debugger they would just step through the function call to see where it goes. It just does not seem to be a viable technique and the resulting source code is a real pain. It would seem to make working with the source much more difficult because tools of most modern IDEs for finding functions and debugging will be much more difficult to use. – Richard Chambers Nov 03 '15 at 14:47
  • @RichardChambers Agreed. If you don't want someone to see what your code does, you can't give them the binary. – Andrew Henle Nov 03 '15 at 14:50
  • @RichardChambers: I think the point of this obfuscation is to *hide* where the function is called *from*. – Karoly Horvath Nov 03 '15 at 14:50
  • 1
    @RichardChambers I agree with you. However in the given book there are debugger detecting methods described too. However I am a beginner in this area, and it also true that almost every binary gets cracked. I just want to implement some minimal protection. – robert Nov 03 '15 at 14:52
  • First of all, post code that actually compiles in C. There's many problems with this code that need to be fixed before doing anything else. Did you rewrite it by hand? _Upvoters_ care to explain? – Lundin Nov 03 '15 at 14:58
  • Specifically, `SET_FN_PTR(my_func, 0x01301100);` would leave a semi colon after a function definition, which is not allowed. The function is of type `void* (*) (void)` but the function pointer in main is of type `void (*) (void);`. Also `my_func` is not of that type either. – Lundin Nov 03 '15 at 14:58
  • @Lundin It is taken from the book mentioned at the beginning without modification. – robert Nov 03 '15 at 14:59
  • @franz1 Find the author of that book and tell him that he needs to read a book about C programming then... Your question is impossible to answer since this is not even close to something that will compile. May I ask what compiler that let this mess through? – Lundin Nov 03 '15 at 15:01
  • 1
    @Lundin: `gcc` happily allows extra semicolon is global scope. You actually need `-pedantic` to get a warning. – Karoly Horvath Nov 03 '15 at 15:03
  • @KarolyHorvath Alright, well you should always compile with `-pedantic-errors -Wall -Wextra` or gcc is going to misbehave. The incorrect type conflicts all over this code makes it impossible to answer the question. – Lundin Nov 03 '15 at 15:04
  • How would this method work with shared libraries or dynamic link libraries? – Richard Chambers Nov 03 '15 at 18:38

3 Answers3

7

The problem with that way to obfuscating function call relies on the compiler not being smart enough to see through the obfuscation. The idea here was that the caller shouldn't contain a direct reference to the function to be called, but to retrieve the pointer to the function from another function.

However modern compiler does this and when applying optimization they remove the obfuscation again. What the compiler does is probably simple inline expansion of GET_FN_PTR and when inline expanded it is quite obvious how to optimize - it's just a bunch of constants that's combined into a pointer which is then called. Constant expressions are quite easy to compute at compile time (and is often done).

Before you obfuscate your code you should probably have a good reason to do so, and use a method suitable for the needs.

skyking
  • 13,817
  • 1
  • 35
  • 57
7

The idea of the suggested approach is to use an indirect function call so that the function address must be computed first and then called. The C Preprocessor is used to provide a way to define a proxy function for the actual function and this proxy function provides the calculation needed to determine the actual address of the real function which the proxy function provides access to.

See Wikipedia article Proxy pattern for details about the Proxy design pattern which has this to say:

The proxy design pattern allows you to provide an interface to other objects by creating a wrapper class as the proxy. The wrapper class, which is the proxy, can add additional functionality to the object of interest without changing the object's code.

I would suggest an alternative which implements the same type of indirect call however it does not require using the C Preprocessor to hide implementation details in such a fashion as to make reading of the source code difficult.

The C compiler allows for a struct to contain function pointers as members. What is nice about this is that you can define an externally visible struct variable with function pointers a members yet when the struct is defined, the functions specified in the definition of the struct variable can be static meaning they have file visibility only (see What does "static" mean in a C program.)

So I can have two files, a header file func.h and an implementation file func.c which define the struct type, the declaration of the externally visible struct variable, the functions used with a static modifier, and the externally visible struct variable definition with the function addresses.

What is attractive about this approach is that the source code is easy to read and most IDEs will handle this sort of indirect much nicer because the C Preprocessor is not being used to create source at compile time which affects readability by people and by software tools such as IDEs.

An example func.h file, which would be #included into the C source file using the functions, could look like:

// define a type using a typedef so that we can declare the externally
// visible struct in this include file and then use the same type when
// defining the externally visible struct in the implementation file which
// will also have the definitions for the actual functions which will have
// file visibility only because we will use the static modifier to restrict
// the functions' visibility to file scope only.
typedef struct {
    int (*p1)(int a);
    int (*p2)(int a);
} FuncList;

// declare the externally visible struct so that anything using it will
// be able to access it and its members or the addresses of the functions
// available through this struct.
extern FuncList myFuncList;

And the func.c file example could look like:

#include <stdio.h>

#include "func.h"

// the functions that we will be providing through the externally visible struct
// are here.  we mark these static since the only access to these is through
// the function pointer members of the struct so we do not want them to be
// visible outside of this file. also this prevents name clashes between these
// functions and other functions that may be linked into the application.
// this use of an externally visible struct with function pointer members
// provides something similar to the use of namespace in C++ in that we
// can use the externally visible struct as a way to create a kind of
// namespace by having everything go through the struct and hiding the
// functions using the static modifier to restrict visibility to the file.

static int p1Thing(int a)
{
    return printf ("-- p1 %d\n", a);
}

static int p2Thing(int a)
{
    return printf ("-- p2 %d\n", a);
}

// externally visible struct with function pointers to allow indirect access
// to the static functions in this file which are not visible outside of
// this file.  we do this definition here so that we have the prototypes
// of the functions which are defined above to allow the compiler to check
// calling interface against struct member definition.
FuncList myFuncList = {
    p1Thing,
    p2Thing
};

A simple C source file using this externally visible struct could look like:

#include "func.h"

int main(int argc, char * argv[])
{
    // call function p1Thing() through the struct function pointer p1()
    myFuncList.p1 (1);
    // call function p2Thing() through the struct function pointer p2()
    myFuncList.p2 (2);
    return 0;
}

The assembler emitted by Visual Studio 2005 for the above main() looks like the following showing a computed call through the specified address:

; 10   :    myFuncList.p1 (1);

  00000 6a 01        push    1
  00002 ff 15 00 00 00
    00       call    DWORD PTR _myFuncList

; 11   :    myFuncList.p2 (2);

  00008 6a 02        push    2
  0000a ff 15 04 00 00
    00       call    DWORD PTR _myFuncList+4
  00010 83 c4 08     add     esp, 8

; 12   :    return 0;

  00013 33 c0        xor     eax, eax

As you can see this function calls are now indirect function calls through a struct specified by an offset within the struct.

The nice thing about this approach is that you can do whatever you want to the memory area containing the function pointers so long as before you call a function through the data area, the correct function addresses have been put there. So you could actually have two functions, one that would initialize the area with the correct addresses and a second that would clear the area. So before using the functions you would call the function to initialize the area and after finishing with the functions call the function to clear the area.

// file scope visible struct containing the actual or real function addresses
// which can be used to initialize the externally visible copy.
static FuncList myFuncListReal = {
    p1Thing,
    p2Thing
};

// NULL addresses in externally visible struct to cause crash is default.
// Must use myFuncListInit() to initialize the pointers
// with the actual or real values.
FuncList myFuncList = {
    0,
    0
};

// externally visible function that will update the externally visible struct
// with the correct function addresses to access the static functions.
void myFuncListInit (void)
{
    myFuncList = myFuncListReal;
}

// externally visible function to reset the externally visible struct back
// to NULLs in order to clear the addresses making the functions no longer
// available to external users of this file.
void myFuncListClear (void)
{
    memset (&myFuncList, 0, sizeof(myFuncList));
}

So you could do something like this modified main():

myFuncListInit();
myFuncList.p1 (1);
myFuncList.p2 (2);
myFuncListClear();

However what you would really want to do is to have the call to myFuncListInit() be someplace in the source that would not be near where the functions are actually used.

Another interesting option would be to have the data area encrypted and in order to use the program, the user would need to enter the correct key to properly decrypt the data to get the correct pointer addresses.

Community
  • 1
  • 1
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
  • Why are p1 and p2 static? Deleting the static keyword does not change the assembler of main. – robert Nov 04 '15 at 07:55
  • 1
    @franz1, functions `p1` and `p2` are static in order to reduce their scope of visibility to file scope. In other words the functions `p1` and `p2` are not visible as functions outside of the file `func.c` and the only way they can be accessed is through the function pointers in the externally visible struct `myFuncList`. Removing `static` does not affect the assembler of `main()` because `main()` is accessing them through the struct `myFuncList` even though they are visible to `main()` once you remove the `static` modifier. – Richard Chambers Nov 04 '15 at 12:27
  • @franz1 I have changed the names of the functions from p1 to p1Thing and p2 to p2Thing to make it clear that the struct members are pointer variables which point to a function. I was wondering if using the same text for function name and member name, which are different entities, was confusing to you. – Richard Chambers Nov 04 '15 at 15:33
  • I appreciate your efforts to make this clear to me. I still do not understand why is necessary the file scope for p1Thing and p2Thing. In order to prevent the compiler to replace the indirect function calls with direct ones? – robert Nov 05 '15 at 07:25
  • @franz1 it is not necessary to make p1Thing and p2Thing file scope to prevent the compiler replacing the indirect function calls with direct ones when using the externally visible struct with function pointers. Whether they have file scope or not, the compiler will still use the indirect call because you specify that the function pointer members are to be used. I just use file scope in order to reduce namespace pollution with extra, unneeded externally visible function names. – Richard Chambers Nov 05 '15 at 13:51
  • Thank you! However if somebody steps through the assembly will be able to see the correct function calls. Is this true? – robert Nov 05 '15 at 14:07
  • @franz1 if someone is stepping through the code they will see exactly what is there. this applies no matter how you try to obfuscate the code because the computer has to be able to call the function and if the computer can call the function then a debugger can step through to the function. using the struct what you see in the assembly is an indirect call just like with the technique you posted. the technique you posted shows the algorithm of obfuscation in the assembler so I am not sure how well that approach actually works against someone knowledgeable. – Richard Chambers Nov 05 '15 at 14:24
  • OK, I just want to be sure that I am understanding it. Thanks again! – robert Nov 05 '15 at 14:27
  • @franz1 I am happy to be helpful. It was kind of an interesting exercise and your questions provided me the information to make my answer better so I appreciate your patience. Take care and good luck on your project. – Richard Chambers Nov 05 '15 at 18:04
0

The "obfuscation" in C/C++ is mainly related to the size of compiled code. If it is too short (e.g. 500-1000 assembly lines), every middle level programmer can decode it and find what is necessary for several days or hours.

i486
  • 6,491
  • 4
  • 24
  • 41