1

I created an extension for WinDbg using DbgEng, which records trace containing the state of the registers and stack for each executed instruction in user mode and in kernel mode.
For step-tracing, I use breakpoints (I also tried sets the TRAP flag to threads context for which a trace is recorded, but it gave the same result). After each step and hitting a breakpoint IDebugEventCallbacks::Breakpoint() is called, which returns DEBUG_STATUS_BREAK. This causes the debugger to complete the IDebugControl::WaitForEvent() function in the engine loop and I record the state for the current instruction.
The trace is recorded in a test multithreaded application, on a multi-core system (4-physical, 8-virtual cores). The test application creates 6 threads apart from the main one. 4 threads record trace of the same function func0 asynchronously. The other 3 created threads simulate calculations.
Code of the test program:

void func3();
void func2();
void func1();
DWORD WINAPI func0(LPVOID lpArg);
DWORD WINAPI LoopProcessing(LPVOID lpArg);

void func3()
{
    int a = 1;
    a += 2;
    a *= 60;
    a += 31;
    printf(u8"func3\n");
    a /= 2;
    a *= 777;
    a -= 31;
}

void func2()
{
    int a = 1;
    a += 5;
    a *= 41;
    a += 55;
    printf(u8"func2\n");
    a /= 3;
    a *= 414;
    a -= 41;
}

void func1()
{
    int a = 1;
    a += 95;
    a *= 14;
    a += 72;
    printf(u8"func1\n");
    a /= 1;
    a *= 42;
    a -= 37;
}

DWORD WINAPI func0(LPVOID lpArg)
{
    func1();
    func2();
    func3();

    return 0;
}

DWORD WINAPI LoopProcessing(LPVOID lpArg)
{
    int a = 1,
        b = 2,
        c = 3;

    b = 2 * c + 1;
    c = a + b * c;
    a = b * c + a;
    SwitchToThread();

    return 0;
}

int main(int argc, char* argv[])
{
    const size_t szThreadsCount = 6;
    HANDLE hThreads[szThreadsCount];
    
    for (int iIndexIteration = 0; iIndexIteration < 15; iIndexIteration++)
    {
        printf("====================== Iteration %i start ======================\n", iIndexIteration);

        hThreads[0] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[1] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);
        hThreads[2] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[3] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);
        hThreads[4] = CreateThread(NULL, NULL, func0, NULL, NULL, NULL);
        hThreads[5] = CreateThread(NULL, NULL, LoopProcessing, NULL, NULL, NULL);

        for (int iIndex = 0; iIndex < 3; iIndex++)
        {
            func0(0);
        }

        WaitForMultipleObjects(szThreadsCount, hThreads, TRUE, INFINITE);

        for (int iIndex = 0; iIndex < szThreadsCount; iIndex++)
        {
            CloseHandle(hThreads[iIndex]);
            hThreads[iIndex] = nullptr;
        }
    }

    return 0;
}

In most cases, everything works as expected, and I get a fully recorded trace like this:

…
036|00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
Registers:
rax: 0x00000000CCCCCCCC rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314215 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
 
 
037|00007ff6`ad31421a mov     dword ptr [rbp+4],1     ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD31421A rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000202
 
 
038|00007ff6`ad314221 mov     eax,dword ptr [rbp+4]   ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314221 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
…

But sometimes the debugger skips steps and doesn't trigger breakpoints.
For example:

00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
00007ff6`ad31421a mov     dword ptr [rbp+4],1
00007ff6`ad314221 mov     eax,dword ptr [rbp+4]

Suppose the state rip = 00007ff6'ad314215 has been recorded and a breakpoint is set at the following address 00007ff6'ad31421a. After that, I continue the execution of the program and expect for the breakpoint at address 00007ff6'ad31421a to be triggered. But instead of the expected hit to breakpoint at 00007ff6'ad31421a and call IDebugEventCallbacks::Breakpoint(), my current rip = 00007ff6'ad314221 and I get a call to IDebugEventCallbacks::Exception() with the following parameters:

Exception code: (0x80000003) == EXCEPTION_BREAKPOINT
Exception address: 0x7FF6AD314221 - address of the next instruction after the breakpoint.
threadContext.Rip: 0x7FF6AD314221 - instruction address of current thread context.

So the system has already executed the instruction at 00007ff6'ad31421a, but for some reason the debugger didn't hit the set breakpoint and raised an exception on the next instruction(00007ff6`ad314221) where any breakpoint was not set.

I also tried checking the program counter (threadContext.rip) in IDebugEventCallbacks::ChangeEngineState() for the thread whose execution is being recorded, but I never got a counter there equal to the address of the instruction that was skipped (00007ff6'ad31421a).

As a result, there is no state for address 00007ff6'ad31421a in my trace:

…
036|00007ff6`ad314215 call    Saek0!ILT+550(__CheckForDebuggerJustMyCode) (00007ff6`ad30122b)
Registers:
rax: 0x00000000CCCCCCCC rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314215 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
 
 
037|00007ff6`ad314221 mov     eax,dword ptr [rbp+4]   ss:00000000`00000004=????????
Registers:
rax: 0x0000000000000001 rbx: 0x0000000000000000 rcx: 0x00007FF6AD33208C
rdx: 0x00007FF6AD30155F rsi: 0x0000000000000000 rdi: 0x0000000000A2FE18
rip: 0x00007FF6AD314221 rsp: 0x0000000000A2FD10 rbp: 0x0000000000A2FD30
r8:  0x0000000000000001 r9:  0x00007FF6AD30155F r10: 0x0000000000000000
r11: 0x0000000000000000 r12: 0x0000000000000000 r13: 0x0000000000000000
r14: 0x0000000000000000 r15: 0x0000000000000000
EFlags: 0xCCCCCCCC00000302
…

Questions:

  1. What can be the reason for this behavior?
  2. I have met the opinion that this behavior can be caused by execution on a multi-core system, but the author of this judgment didn't refer to any sources where this was somehow justified. If this is indeed the case, can someone provide links to the rationale?
  3. Any suggestions to solve the problem?
  4. Are there tools that allow you to record a similar trace with states in Windows as in user mode and in kernel mode?
  • Interesting project. 1.) I'm thinking of [Out of order execution](https://en.wikipedia.org/wiki/Out-of-order_execution), which 2.) would not only happen in multi-core systems. 3.) Set more breakpoints in advance 4.) I don't know – Thomas Weller Nov 20 '20 at 07:17
  • Step over or step in if using step over debugger may set an internal oneshot temp bp and handle the exception see if you are handling this – blabb Nov 20 '20 at 10:35
  • @ThomasWeller Thanks for your reply. I learned more about "out of order execution". I ran more tests and came to conclusion that problem is not "out of order execution"after all. I think that instruction at 00007ff6'ad314221 could not have been executed before the instruction at 00007ff6'ad31421a, because they both refer to [rbp+4]. I also tried setting breakpoints on all instructions before starting the program, and it gave the same result. I also assume that the debugger takes care that the instruction queue is updated before executing them when instructions are overwritten. – M3ik Shizuka Nov 24 '20 at 02:06
  • @blabb Thanks for your reply. Yes, I'm handling this. – M3ik Shizuka Nov 24 '20 at 02:10

0 Answers0