0

Like the title says, I want to trace ALL functions calls in my application (from inside).

I tried using "_penter" but I get either a recursion limit reached error or an access violation when I try to prevent the recursion.

Is there any way to achieve this ?

Update

What I tried:

extern "C"
{
    void __declspec(naked) _cdecl _penter()
    {
        _asm {
            push    eax
            push    ecx
            push    edx
            mov     ecx, [esp + 0Ch]
            push    ecx
            mov     ecx, offset Context::Instance
            call    Context::addFrame
            pop     edx
            pop     ecx
            pop     eax
            ret
        }
}

class Context
{
 public:
    __forceinline void addFrame(const void* addr) throw() {}

    static thread_local Context Instance;
};

sadly this still gives a stack overflow due to recursion

user1233963
  • 1,450
  • 15
  • 41
  • Downvotes...why ? – user1233963 Jan 27 '18 at 16:44
  • 3
    Trace them in what sense? Log them? Step through them? Something else? – Beta Jan 27 '18 at 16:45
  • Lets say I want to log them – user1233963 Jan 27 '18 at 16:45
  • 1
    In each function, put a line that writes some information to a common log file. – Beta Jan 27 '18 at 16:47
  • That's not feasible for me, there are too many functions. I need something more generic (like _penter) – user1233963 Jan 27 '18 at 16:48
  • Have you considered using a profiler (like `perf`)? – Jesper Juhl Jan 27 '18 at 16:50
  • like I said, i want the tracing from inside the program, not outside – user1233963 Jan 27 '18 at 16:51
  • Modifying the compiler to inject the tracing gathering code you want into each function would be one way. – Jesper Juhl Jan 27 '18 at 17:01
  • sadly I don't have msvc source code for that :( – user1233963 Jan 27 '18 at 17:08
  • 1
    What's the real problem you're trying to solve -- as stated it sounds a bit [XY](http://xyproblem.info/). Also, the fact that you're hitting a recursion limit when using `_penter` sounds a bit strange. If you *do* want to inject tracing code into your source you could always have a look at [clang's tooling library](https://clang.llvm.org/docs/LibTooling.html). – G.M. Jan 27 '18 at 17:32
  • 1
    @G.M. Call any non-inlined function from `_penter`, compiler will insert another `_penter` there, and you’ll get endless recursion. – Soonts Jan 28 '18 at 09:03
  • 2
    *but I get either a recursion limit reached error* - sure that inside `Context::addFrame` implementation compiler also insert call `_penter` which recursive call `Context::addFrame`. you need implement `Context::addFrame` in separate *c++* file compiled without `/Gh` option – RbMm Jan 28 '18 at 15:08
  • and `__forceinline` have no effect when it called from asm code. compiler can not insert a copy of the function body in this case – RbMm Jan 28 '18 at 15:15
  • and in case x86 you need save only *rcx*, *rdx* in case you using `__fastcall` functions in code. otherwise you not need save any registers at all. in x64 case you need save *rcx,rdx,r8,r9* – RbMm Jan 28 '18 at 15:25

3 Answers3

3

Your approach is correct, /Gh and /GH compiler switches + _penter and _pexit functions is the way to go.

I think there’re errors in your implementation of these functions. That’s very low-level stuff, for 32 bit builds you have to use __declspec(naked), and for 64 bit builds you have to use assembler. Both are quite tricky to implement correctly.

Take a look at this repository for an example how to do it right: https://github.com/tyoma/micro-profiler Specifically, to this source file: https://github.com/tyoma/micro-profiler/blob/master/micro-profiler/collector/hooks.asm As you see, they decided to use assembler for both platforms, and from that they call some C++ function to record call information. Also note how in C++ collector implementation they use __forceinline to avoid recursion.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • I tried doing it like in that repo but sadly to no avail (I updated the question with code sample). Could the fact that I use inline assembly instead of nasm be the problem? – user1233963 Jan 28 '18 at 10:04
  • @user1233963 That should works too, here’s an example: https://github.com/OSRDrivers/penter/blob/master/penterlib/penterlib.c One possible problem is thread_local, I’m not sure you’ll get desired behavior when getting address of that variable in assembler code. – Soonts Jan 28 '18 at 16:05
2

but I get either a recursion limit reached error

this can be if inside Context::addFrame implementation compiler also insert call _penter which recursive call Context::addFrame.

but how __forceinline you can ask ? nothing. c/c++ compiler to insert a copy of the function body into each place the function is called from code which is generated by this compiler. c/c++ compiler can not insert a copy of the function body into code, which he not compile itself. so when we call function marked as __forceinline from assembler code - function will be called in usual way but not expanded in place. so your __forceinline simply have no effect and sense

you need implement Context::addFrame (and all functions which it call) in separate c++ file (let be context.cpp) compiled without /Gh option.

you can set /Gh for all files in project, except context.cpp

if exist too many cpp files in project - you can set /Gh for project, but how then remove it for single file context.cpp ? exist one original way - you can copy <cmdline> for this file and that set custom build tool for it Command Line- CL.exe <cmdline> $(InputFileName) (not forget remove /Gh) and Outputs - $(IntDir)\$(InputName).obj. original by perfect work.

so in context.cpp you can have next code:

class Context
{
public:
    void __fastcall addFrame(const void* addr);

    int _n;

    static thread_local Context Instance;
};

thread_local Context Context::Instance;

void __fastcall Context::addFrame(const void* addr)
{
#pragma message(__FUNCDNAME__)

    DbgPrint("%p>%u\n", addr, _n++);
}

if Context::addFrame call some another internal function (explicit or implicit) - put it also in this file, which compile without /Gh

the _penter better implement in separate asm file, but not as inline asm (this not supported in x64 anyway)

so for x86 you can create code32.asm ( ml /c /Cp $(InputFileName) -> $(InputName).obj)

.686p

.MODEL flat

extern ?addFrame@Context@@QAIXPBX@Z:proc
extern ?Instance@Context@@2V12@A:byte

_TEXT segment 'CODE'

__penter proc
    push edx
    push ecx
    mov edx,[esp+8]
    lea ecx,?Instance@Context@@2V12@A
    call ?addFrame@Context@@QAIXPBX@Z
    pop ecx
    pop edx
    ret
__penter endp

_TEXT ends
end

note - you need save only rcx and rdx (if you use __fastcall , except context.cpp, functions)

for x64 - create code64.asm ( ml64 /c /Cp $(InputFileName) -> $(InputName).obj)

extern ?addFrame@Context@@QEAAXPEBX@Z:proc
extern ?Instance@Context@@2V12@A:byte

_TEXT segment 'CODE'

_penter proc
    mov [rsp+8],rcx
    mov [rsp+16],rdx
    mov [rsp+24],r8
    mov [rsp+32],r9
    mov rdx,[rsp]
    sub rsp,28h
    lea rcx,?Instance@Context@@2V12@A
    call ?addFrame@Context@@QEAAXPEBX@Z
    add rsp,28h
    mov r9,[rsp+32]
    mov r8,[rsp+24]
    mov rdx,[rsp+16]
    mov rcx,[rsp+8]
    ret
_penter endp

_TEXT ENDS
end
RbMm
  • 31,280
  • 3
  • 35
  • 56
1

Here is what I use

Configuration Properties > C/C++ > Command Line

Add compiler option to Additional Options box

Like so example settings

Add flag /Gh for _penter hook
Add flag /GH for _pexit hook

Code I use for tracing / logging

#include <intrin.h>

extern "C"  void __declspec(naked) __cdecl _penter(void) {
    __asm {
        push ebp;               // standard prolog
        mov ebp, esp;
        sub esp, __LOCAL_SIZE
        pushad;                 // save registers
    }
    // _ReturnAddress always returns the address directly after the call, but that is not the start of the function!
    PBYTE addr;
    addr = (PBYTE)_ReturnAddress() - 5;

    SYMBOL_INFO* mysymbol;
    HANDLE       process;
    process = GetCurrentProcess();
    SymInitialize(process, NULL, TRUE);
    mysymbol = (SYMBOL_INFO*)calloc(sizeof(SYMBOL_INFO) + 256 * sizeof(char), 1);
    mysymbol->MaxNameLen = 255;
    mysymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
    SymFromAddr(process, (DWORD64)((void*)addr), 0, mysymbol);
    myprintf("Entered Function: %s [0x%X]\n", mysymbol->Name, addr);

    _asm {
        popad;              // restore regs
        mov esp, ebp;       // standard epilog
        pop ebp;
        ret;
    }
}

extern "C"  void __declspec(naked) __cdecl _pexit(void) {
    __asm {
        push ebp;               // standard prolog
        mov ebp, esp;
        sub esp, __LOCAL_SIZE
        pushad;                 // save registers
    }
    // _ReturnAddress always returns the address directly after the call, but that is not the start of the function!
    PBYTE addr;
    addr = (PBYTE)_ReturnAddress() - 5;

    SYMBOL_INFO* mysymbol;
    HANDLE       process;
    process = GetCurrentProcess();
    SymInitialize(process, NULL, TRUE);
    mysymbol = (SYMBOL_INFO*)calloc(sizeof(SYMBOL_INFO) + 256 * sizeof(char), 1);
    mysymbol->MaxNameLen = 255;
    mysymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
    SymFromAddr(process, (DWORD64)((void*)addr), 0, mysymbol);
    myprintf("Exit Function: %s [0x%X]\n", mysymbol->Name, addr);

    _asm {
        popad;              // restore regs
        mov esp, ebp;       // standard epilog
        pop ebp;
        ret;
    }
}
SSpoke
  • 5,656
  • 10
  • 72
  • 124
  • Thanks for the code. It worked for me with minor tweaks. But watch out! You allocate heap memory and then never deallocate it. If there are really many calls, then the application will allocate all the available memory and then crash (like in my case). I've changed the code to `char buffer[sizeof(SYMBOL_INFO) + MAX_NAME_LENGTH + 1]; mysymbol = reinterpret_cast(&buffer[0]); mysymbol->MaxNameLen = MAX_NAME_LENGTH;`. – Alex Che Jun 10 '22 at 17:09