-1

I have to change the designated section of function_b so that it changes the stack in such a way that the program prints:

Executing function_a
Executing function_b
Finished!

At this point it also prints Executed function_b in between Executing function_b and Finished!.

I have the following code and I have to fill something in, in the part where it says // ... insert code here

#include <stdio.h>

void function_b(void){
char buffer[4];

// ... insert code here

fprintf(stdout, "Executing function_b\n");
}

void function_a(void) {
int beacon = 0x0b1c2d3;
fprintf(stdout, "Executing function_a\n");
function_b();
fprintf(stdout, "Executed function_b\n");
}

int main(void) {
function_a();
fprintf(stdout, "Finished!\n");
return 0;
}

I am using Ubuntu Linux with the gcc compiler. I compile the program with the following options: -g -fno-stack-protector -fno-omit-frame-pointer. I am using an intel processor.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • 3
    There is no stack in the C language. – too honest for this site Mar 18 '17 at 23:44
  • 4
    @Olaf Yes there is, even if the C standard doesn't call it “stack”. – Gilles 'SO- stop being evil' Mar 18 '17 at 23:52
  • What environment are you working in (operating system, processor type, compiler, compiler options)? What you want to do is very dependent on all these parameters. – Gilles 'SO- stop being evil' Mar 18 '17 at 23:52
  • @Gilles Hi, thanks for your quick reply. I am using Ubuntu Linux with the gcc compiler. I compile the program with the following options: -g -fno-stack-protector -fno-omit-frame-pointer. I am using an intel processor. – user7733386 Mar 19 '17 at 00:11
  • 1
    @Gilles: Please provide a reference to the standard where it requires a stack (or something like a stack, even if it is called "Fred"). – too honest for this site Mar 19 '17 at 00:20
  • So the desired output requires to skip this line: `fprintf(stdout, "Executed function_b\n");`? – Arash Mar 19 '17 at 00:27
  • 1
    you need increment return address of `function_b` on n bytes - exactly how many bytes take `fprintf(stdout, "Executed function_b\n");` but is case this code I think impossible calculate this correct in way for this will be work on both x86 and x64, with any optimization and calling convention. how ever some time similar tasks have absolute correct solution - for [example](http://stackoverflow.com/a/41074284/6401656) . your question also asked [here](http://reverseengineering.stackexchange.com/questions/8464/returning-a-c-function-to-its-grandfather) – RbMm Mar 19 '17 at 01:22
  • @Gilles what terminology does the C standard use to describe the notion of a "stack"? Thanks! – sigjuice Mar 19 '17 at 03:31
  • @sigjuice - may be in C standard not exist CPU registers for example, but it really exist. and stack exist on x86/x64 platform, independed covered it in c standard or not – RbMm Mar 19 '17 at 04:31

2 Answers2

1

Here is a solution, not exactly stable across environments, but works for me on x86_64 processor on Windows/MinGW64. It may not work for you out of the box, but still, you might want to use a similar approach.

void function_b(void) {
    char buffer[4];
    buffer[0] = 0xa1; // part 1
    buffer[1] = 0xb2;
    buffer[2] = 0xc3;
    buffer[3] = 0x04;
    register int * rsp asm ("rsp"); // part 2
    register size_t r10 asm ("r10");
    r10 = 0;
    while (*rsp != 0x04c3b2a1) {rsp++; r10++;} // part 3
    while (*rsp != 0x00b1c2d3) rsp++; // part 4
    rsp -= r10; // part 5
    rsp = (int *) ((size_t) rsp & ~0xF); // part 6
    fprintf(stdout, "Executing function_b\n");
}

The trick is that each of function_a and function_b have only one local variable, and we can find the address of that variable just by searching around in the memory.

  1. First, we put a signature in the buffer, let it be the 4-byte integer 0x04c3b2a1 (remember that x86_64 is little-endian).

  2. After that, we declare two variables to represent the registers: rsp is the stack pointer, and r10 is just some unused register. This allows to not use asm statements later in the code, while still being able to use the registers directly. It is important that the variables don't actually take stack memory, they are references to processor registers themselves.

  3. After that, we move the stack pointer in 4-byte increments (since the size of int is 4 bytes) until we get to the buffer. We have to remember the offset from the stack pointer to the first variable here, and we use r10 to store it.

  4. Next, we want to know how far in the stack are the instances of function_b and function_a. A good approximation is how far are buffer and beacon, so we now search for beacon.

  5. After that, we have to push back from beacon, the first variable of function_a, to the start of instance of the whole function_a on the stack. That we do by subtracting the value stored in r10.

  6. Finally, here comes a werider bit. At least on my configuration, the stack happens to be 16-byte aligned, and while the buffer array is aligned to the left of a 16-byte block, the beacon variable is aligned to the right of such block. Or is it something with a similar effect and different explanation?.. Anyway, so we just clear the last four bits of the stack pointer to make it 16-byte aligned again. The 32-bit GCC doesn't align anything for me, so you might want to skip or alter this line.


When working on a solution, I found the following macro useful:

#ifdef DEBUG
#define show_sp() \
    do { \
        register void * rsp asm ("rsp"); \
        fprintf(stdout, "stack pointer is %016X\n", rsp); \
    } while (0);
#else
#define show_sp() do{}while(0);
#endif

After this, when you insert a show_sp(); in your code and compile with -DDEBUG, it prints what is the value of stack pointer at the respective moment. When compiling without -DDEBUG, the macro just compiles to an empty statement. Of course, other variables and registers can be printed in a similar way.

Gassa
  • 8,546
  • 3
  • 29
  • 49
  • this is absolute incorrect at all. not a solution simply add stack pointer. for example function can change non-volatile registers, and save it in stack before this. you not restore it original values. we can not know how deep `beacon` in stack relative return address and `A good approximation is how far are buffer and beacon` - wrong really. I already not say that with optimization will be no any `beacon` in stack at all. all wrong from begin to end. and initial question not have correct solution too. – RbMm Mar 19 '17 at 04:45
  • only `6.` is really true on x64 platform - *The stack is always 16-byte aligned when a call instruction is executed. When the call instruction pushes the return address, the stack is 16-byte aligned. The prolog code of the called function will re-align the stack as normal.* – RbMm Mar 19 '17 at 04:57
  • [*The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed)*](https://msdn.microsoft.com/en-us/library/ew5tede7.aspx) and this is true fol all compilers on x64, including gcc too. – RbMm Mar 19 '17 at 05:03
  • @RbMm Thank you for the comments! Nice to see that I actually got a theoretically sound bit in the solution. Of course if optimizations are enabled, we would have to litter the whole program with `volatile` and whatnot. But the exact command line options are given in the assignment, so the point of the question is rather "get it working once, here and now" than to outline a theoretically correct solution - which, as you say, does not exist in the first place. – Gassa Mar 19 '17 at 09:46
  • @RbMm I see that, in the question you linked, the main difference seems that it alters the instance of `function_b` instead of messing with the registers. So any entry/exit code of the function works correctly. But still, the possible necessary code at exit from `function_a` won't be called anyway **unless** we manage to skip just one `fprintf` from it. And that would require (a non-portable) guesswork on how many bytes the call and preparations instructions actually take, right? Anyway, I'd be delighted to see a "more correct" answer here. – Gassa Mar 19 '17 at 09:51
  • 1
    you change stack pointer (rsp or esp based on x64/x86) in `function_b` to value which must be in `function_a` and finally instead execute epilogue of `A` you execute epilogue of `B`. this is correct if epilogue of this two functions is the same. it not equivalent / symmetric - but this assumption really can be true. based on this I paste own solution and it based onlyon this assumption. and worked with any optimization even if beacon will be dropped by compiler – RbMm Mar 19 '17 at 09:52
  • how many bytes take `fprintf` call for adjust return address on this byte count - impossible calc - can be very different implementations. but I paste another solution (for *CL* ) compiler. but if you know analog for `_AddressOfReturnAddress` for *GCC* - you can easy test my code o GCC also. it work with any optimization and independed from x86 or x64 – RbMm Mar 19 '17 at 09:57
0

ok, let assume that epilogue (i.e code at } line) of function_a and for function_b is the same

despite functions A and B not symmetric, we can assume this because it have the same signature (no parameters, no return value), same calling conventions and same size of local variables (4 byte - int beacon = 0x0b1c2d3 vs char buffer[4];) and with optimization - both must be dropped because unused. but we must not use additional local variables in function_b for not break this assumption. most problematic point here - what is function_A or function_B will be use nonvolatile registers (and as result save it in prologue and restore in epilogue) - but however look like here no place for this.

so my next code based on this assumption - epilogueA == epilogueB (really solution of @Gassa also based on it.

also need very clearly state that function_a and function_b must not be inline. this is very important - without this any solution impossible. so I let yourself add noinline attribute to function_a and function_b. note - not code change but attribute add, which author of this task implicitly implies but not clearly stated. don't know how in GCC mark function as noinline but in CL __declspec(noinline) for this used.

next code I write for CL compiler where exist next intrinsic function

void * _AddressOfReturnAddress();

but I think that GCC also must have the analog of this function. also I use

void* _ReturnAddress();

but however really _ReturnAddress() == *(void**)_AddressOfReturnAddress() and we can use _AddressOfReturnAddress() only. simply using _ReturnAddress() make source (but not binary - it equal) code smaller and more readable.

and next code is work for both x86 and x64. and this code work (tested) with any optimization.

despite I use 2 global variables - code is thread safe - really we can call main from multiple threads in concurrent, call it multiple time - but all will be worked correct (only of course how I say at begin if epilogueA == epilogueB)

hope comments in code enough self explained

__declspec(noinline) void function_b(void){
    char buffer[4];

    buffer[0] = 0;

    static void *IPa, *IPb;

    // save the IPa address
    _InterlockedCompareExchangePointer(&IPa, _ReturnAddress(), 0);

    if (_ReturnAddress() == IPa)
    {
        // we called from function_a
        function_b();
        // <-- IPb
        if (_ReturnAddress() == IPa)
        {
            // we called from function_a, change return address for return to IPb instead IPa
            *(void**)_AddressOfReturnAddress() = IPb;
            return;
        }
        // we at stack of function_a here.
        // we must be really at point IPa
        // and execute fprintf(stdout, "Executed function_b\n"); + '}' (epilogueA)
        // but we will execute fprintf(stdout, "Executing function_b\n"); + '}' (epilogueB)
        // assume that epilogueA == epilogueB
    }
    else
    {
        // we called from function_b
        IPb = _ReturnAddress();
        return;
    }

    fprintf(stdout, "Executing function_b\n");
    // epilogueB
}

__declspec(noinline) void function_a(void) {
    int beacon = 0x0b1c2d3;
    fprintf(stdout, "Executing function_a\n");
    function_b();
    // <-- IPa
    fprintf(stdout, "Executed function_b\n");
    // epilogueA
}

int main(void) {
    function_a();
    fprintf(stdout, "Finished!\n");
    return 0;
}
RbMm
  • 31,280
  • 3
  • 35
  • 56
  • Looks nice. However, I have trouble translating this into GCC. I use `__builtin_return_address(1)` instead of `_ReturnAddress()`, as in your link. The `_InterlockedCompareExchangePointer` part is perhaps `__atomic_compare_exchange`, as [here](https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/_005f_005fatomic-Builtins.html). I see no direct equivalent of `_AddressOfReturnAddress`, and tried to use something along the lines of [this SO answer](http://stackoverflow.com/questions/27213382/how-to-modify-return-address-on-stack-in-c-or-assembly), but it breaks the `if (_ReturnAddress() == IPa)` logic. – Gassa Mar 19 '17 at 11:18
  • @Gassa `_AddressOfReturnAddress` is keypoint, absolute mandatory have compiler support for solution. `_ReturnAddress()` == `*(void**)_AddressOfReturnAddress` so not need really – RbMm Mar 19 '17 at 11:20
  • @Gassa - may be [this](https://sourceforge.net/p/mingw-w64/mingw-w64/ci/d959d7ea3e9b5887cc15cbc97daaa9e03fe5bce9/)`_AddressOfReturnAddress()=(void**)__builtin_frame_address (0) + 1` and need use `__builtin_return_address(0)` but not 1 as `_ReturnAddress()`. but not sure to where really point `__builtin_frame_address (0)` - are it exactly on register size less then address of return address. not use gcc for check – RbMm Mar 19 '17 at 11:29
  • Thanks, I tried that, but didn't really get it working. I'm out for now. For reference, my current attempt is [pasted here](http://pastebin.com/NF2d34Yv). – Gassa Mar 19 '17 at 11:50
  • @Gassa - i have no gcc - so can not check to where `__builtin_frame_address(0)` actual point. need simply look generated assembly code. and compare with `__builtin_return_address(0)`. which code generate `__builtin_return_address(0)` (assume something like `mov rax,[rsp+N]`) and `__builtin_frame_address(0)` which code (assume `lea rax,[rsp+M]`) and how N and M for same function related ? are it equal or say `M=N-sizeof(void*)` ? – RbMm Mar 19 '17 at 12:10