Is there a library available which easily can record and replay results of API calls?

Question

I'm working on writing various things that call relatively complicated Win32 API functions. Here's an example:

//Encapsulates calling NtQuerySystemInformation buffer management.
WindowsApi::AutoArray NtDll::NtQuerySystemInformation( 
    SystemInformationClass toGet ) const
{
    AutoArray result;
    ULONG allocationSize = 1024;
    ULONG previousSize;
    NTSTATUS errorCheck;
    do 
    {
        previousSize = allocationSize;
        result.Allocate(allocationSize);
        errorCheck = WinQuerySystemInformation(toGet, 
            result.GetAs<void>(), allocationSize, &allocationSize);
        if (allocationSize <= previousSize)
            allocationSize = previousSize * 2;
    } while (errorCheck == 0xC0000004L);
    if (errorCheck != 0)
    {
        THROW_MANUAL_WINDOWS_ERROR(WinRtlNtStatusToDosError(errorCheck));
    }
    return result;
}
//Client of the above.
ProcessSnapshot::ProcessSnapshot()
{
    using Dll::NtDll;
    NtDll ntdll;
    AutoArray systemInfoBuffer = ntdll.NtQuerySystemInformation(
        NtDll::SystemProcessInformation);
    BYTE * currentPtr = systemInfoBuffer.GetAs<BYTE>();
    //Loop through the results, creating Process objects.
    SYSTEM_PROCESSES * asSysInfo;
    do
    {
        // Loop book keeping
        asSysInfo = reinterpret_cast<SYSTEM_PROCESSES *>(currentPtr);
        currentPtr += asSysInfo->NextEntryDelta;

        //Create the process for the current iteration and fill it with data.
        std::auto_ptr<ProcImpl> currentProc(ProcFactory(
            static_cast<unsigned __int32>(asSysInfo->ProcessId), this));
        NormalProcess* nptr = dynamic_cast<NormalProcess*>(currentProc.get());
        if (nptr)
        {
            nptr->SetProcessName(asSysInfo->ProcessName);
        }
        // Populate process threads
        for(ULONG idx = 0; idx < asSysInfo->ThreadCount; ++idx)
        {
            SYSTEM_THREADS& sysThread = asSysInfo->Threads[idx];
            Thread thread(
                currentProc.get(),
                static_cast<unsigned __int32>(sysThread.ClientId.UniqueThread),
                sysThread.StartAddress);
            currentProc->AddThread(thread);
        }
        processes.push_back(currentProc);
    } while(asSysInfo->NextEntryDelta != 0);
}

My problem is in mocking out the NtDll::NtQuerySystemInformation method -- namely, that the data structure returned is complicated (Well, here it's actually relatively simple but it can be complicated), and writing a test which builds the data structure like the API call does can take 5-6 times as long as writing the code that uses the API.

What I'd like to do is take a call to the API, and record it somehow, so that I can return that recorded value to the code under test without actually calling the API. The returned structures cannot simply be memcpy'd, because they often contain inner pointers (pointers to other locations in the same buffer). The library in question would need to check for these kinds of things, and be able to restore pointer values to a similar buffer upon replay. (i.e. check each pointer sized value if it could be interpreted as a pointer within the buffer, change that to an offset, and remember to change it back to a pointer on replay -- a false positive rate here is acceptable)

Is there anything out there that does anything like this?

@Nicklamort: Yes. (And no need to delete the comment -- it's a good question) — Billy ONeal, Jan 11 '11 at 21:41
@NickL "the API" in this case is the Windows API. I would call the function I want myself as a part of making the actual unit test. — Billy ONeal, Jan 11 '11 at 23:06
@Nick: I'm looking for something general; i.e. that would not care what API function was being called. — Billy ONeal, Jan 11 '11 at 23:22
@Billy: Are you trying to do this so you have an easier data structure to read (AutoArray), or because you don't want to the api call to change the internal variable/pointer values that it does, or both? ps-thx for your patience — Nick Rolando, Jan 11 '11 at 23:52
@Nick: Neither. I want to be able to replace the API call itself, for testing purposes. For example, in the above, if I want to write tests, I can't simply write a test for the output of the client method, because the processes returned would be different every test run. I want to save the buffer returned by the API, and then be able to return the buffer later. — Billy ONeal, Jan 11 '11 at 23:56
Whose API? It seems to me that you have to instrument whichever API is of interest, whether that be Win32 API or on you have supplied yourself. If you want to learn more about how to do that, I'll provide an answer. — Ira Baxter, Jan 12 '11 at 00:07
@Ira: I've already handled the redirection bit (using Google Mock). I want an easier way to make example buffers for my mock or fake API call to return (and the easiest way to do that is record the value the actual API returns). — Billy ONeal, Jan 12 '11 at 00:14
I don't know much about windows programming, but don't you run the risk here of hiding a bug in the code under test if the bug happens to not manifest with the recorded API return structure? Shouldn't you test with a variety, or handcraft your return stucture to specifically exercise the code under test? That is to say, maybe the long time taken to create the return structure is not wasted...? — Tony Park, Jan 12 '11 at 00:16
@Tony: Not always. Quite often all the code is doing is assigning values from the buffer into C++ classes, wrapping the nasty bits of the API call itself with a C++ interface (as is done in my example above). To test edge cases outside of this code, I end up mocking out `ProcessSnapshot` and friends, not the API call itself. — Billy ONeal, Jan 12 '11 at 00:20
@Billy: I'm just curious... I don't really understand why you'd need a function replay in this particular example of `NtQuerySystemInformation`. Do you mind explaining it? :) — user541686, Jan 12 '11 at 00:57
I think I'm misunderstanding something here, or, how do you propose to generically tell the difference between a pointer and data? — mrduclaw, Jan 12 '11 at 00:57
@mrduclaw: You don't. It's his assumption that data never looks like pointers, and it might get him into trouble later, but he said that's what he wants the library to do.. — user541686, Jan 12 '11 at 01:03
@mrduclaw: I know there's no way to do that. That's why I said "some false positive rate is acceptable". — Billy ONeal, Jan 12 '11 at 01:03
@Billy: But the thing is, is this all so that you can reproduce the code so you can test your own code more easily? If you're having trouble interpreting `NtQuerySystemInformation`'s output, you might want to ask that question instead, since I'm sure people out there (including me) might be able to help with that. — user541686, Jan 12 '11 at 01:06
@Lambert: I want to reproduce exactly what NtQuerySystemInformation returns so that I can test my wrappers around NtQuerySystemInformation more easily, yes. I'm well aware what it returns; I just don't want to spend hours and hours writing \*\*\*loads of code that do nothing but fill in structure definitions. — Billy ONeal, Jan 12 '11 at 01:08
@Billy: I know the pain you're talking about -- I've done something similar to what you're doing, in C#. If you really need *this* solution, then take a look at the code I wrote, but it's probably more trouble than it's worth, IMHO. If you'd like, I can show you some pieces of what I wrote in C#, if you think it can make your task of implementing the wrappers easier. — user541686, Jan 12 '11 at 01:18

user541686 · Accepted Answer · 2011-01-12T01:04:23.767

Is it really that hard to implement by hand?

#include <stddef.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>

typedef char BYTE;

//Note: Buffer is MODIFIED after this call (hence why it's not const)! Don't re-use it!
void saveBuffer(FILE* stream, BYTE* buffer, size_t bufferSize)
{
    assert(bufferSize % sizeof(void*) == 0);
    fwrite(&bufferSize, sizeof(bufferSize), 1, stream);
    for (size_t i = 0; i < bufferSize - sizeof(void*) + 1; i += sizeof(void*))
    {
        void** pAsPointer = (void**)&buffer[i];
        if (*pAsPointer >= buffer && *pAsPointer < buffer + bufferSize)
        {
            fwrite(&i, sizeof(i), 1, stream);
            *pAsPointer = pAsPointer;
        }
    }
    void* null = NULL;
    fwrite(&null, sizeof(null), 1, stream); //Null-terminator
    fwrite(buffer, 1, bufferSize, stream);
}


size_t loadBuffer(FILE* stream, BYTE* buffer, size_t bufferCapacity) //Buffer size is in stream
{
    size_t bufferSize;
    fread(&bufferSize, sizeof(bufferSize), 1, stream);
    if (bufferSize >= bufferCapacity)
    {
        memset(buffer, 0, bufferSize);
        for (;;)
        {
            size_t pointerOffset;
            fread(&pointerOffset, sizeof(pointerOffset), 1, stream);
            if (pointerOffset != 0)
            { *(size_t*)&buffer[pointerOffset] = (size_t)buffer + pointerOffset; }
            else { break; }
        }
        for (size_t i = 0; i < bufferSize; i += sizeof(void*))
        {
            if (*(void**)&buffer[i] == NULL)
            { fread(&buffer[i], sizeof(size_t), 1, stream); }
        }
    }
    return bufferSize;
}

(Sorry, I haven't tested it yet, but it should be pretty darn close to working.)

The only trouble is the assumption that all values that are possibly pointers are in fact pointers, but other than that, it seems straightforward.

Of course it's possible to implement by hand. But I try to avoid reinventing the wheel where possible. — Billy ONeal, Jan 12 '11 at 00:49
It's great to avoid reinventing the wheel, but if it takes less time than actually finding a wheel, you might want to consider it! :) — user541686, Jan 12 '11 at 00:54
But it's definitely not always about *"not reinventing the wheel"* but also and very importantly about *"not reinventing the wheel and make it **crooked**"*. — Jan, Nov 03 '20 at 18:28

score 1 · Answer 2 · answered Jan 12 '11 at 00:33

This problem is at least a subset of a research problem in its full generality, which suggests that there is not going to be an off-the-shelf system for you. A paper on the topic of record and replay of API calls appeared just two years ago in one of the premier operating systems conferences (OSDI 2008). A look on their website shows no available code -- nonetheless, the code, if available, may not be production-quality.

R2: An Application-Level Kernel for Record and Replay Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M. Frans Kaashoek, and Zheng Zhang, OSDI 2008

http://www.usenix.org/event/osdi08/tech/full_papers/guo/guo_html/

http://research.microsoft.com/en-us/people/zhenyug/r2-osdi08.pdf

Abstract

Library-based record and replay tools aim to reproduce an application's execution by recording the results of selected functions in a log and during replay returning the results from the log rather than executing the functions. These tools must ensure that a replay run is identical to the record run. The challenge in doing so is that only invocations of a function by the application should be recorded, recording the side effects of a function call can be difficult, and not executing function calls during replay, multithreading, and the presence of the tool may change the application's behavior from recording to replay. These problems have limited the use of such tools.

R2 allows developers to choose functions that can be recorded and replayed correctly. Developers annotate the chosen functions with simple keywords so that R2 can handle calls with side effects and multithreading. R2 generates code for record and replay from templates, allowing developers to avoid implementing stubs for hundreds of functions manually. To track whether an invocation is on behalf of the application or the implementation of a selected function, R2 maintains a mode bit, which stubs save and restore.

We have implemented R2 on Windows and annotated large parts (1,300 functions) of the Win32 API, and two higher-level interfaces (MPI and SQLite). R2 can replay multithreaded web and database servers that previous library-based tools cannot replay. By allowing developers to choose high-level interfaces, R2 can also keep recording overhead small; experiments show that its recording overhead for Apache is approximately 10%, that recording and replaying at the SQLite interface can reduce the log size up to 99% (compared to doing so at the Win32 API), and that using optimization annotations for BitTorrent and MPI applications achieves log size reduction ranging from 13.7% to 99.4%.

This is about replaying the entire execution of a program, i.e. for debugging purposes, similar to what VMWare is capable of. That's not really what I'm trying to do... — Billy ONeal, Jan 12 '11 at 00:50

score 0 · Answer 3 · answered Jan 12 '11 at 00:04

0

(not expecting rep votes, just want to help/talk code with the community) You say you want an easier data structure to read, but if you use the api (which I think is what you say you want to do..indirectly), you're stuck with the one it gives you. So then you just want to be able to call the api function, but have it restore it's pointer/var values as if you haven't called it?\n I think you are looking for the nearly impossible my friend (or very complex!). You would have to change the api code to restore it's pointer values, but you already know that. Will these values be restored when the program ends? If so, you could probably put this api call in another solution/program and call it from your program?

answered Jan 12 '11 at 00:04

Nick Rolando

25,879
13
79
119

I don't want an easier data structure to read. I want to be able to save the buffer the API returns and reload it later. The pointers can be detected by checking every pointer sized chunk of the buffer and seeing if it can be interpreted as a pointer to somewhere else in that buffer. I've already handled redirection of the API call itself via virtual functions; I merely need to generate example test data for my mock to return. – Billy ONeal Jan 12 '11 at 00:12
Ah. Well then..lol. Can't you copy the values (not addresses) in your buffers over to some kind of dynamic storage array (maybe some kind of self-made vector or list) that you can access later? And you can detect if the given value is an address, go to that location and pull its value instead, etc? Or should I just stop talking XD – Nick Rolando Jan 12 '11 at 00:43

score 0 · Answer 4 · edited May 23 '17 at 12:15

There are several ways :

change the top class to be a template, having one template parameter. For your real implementation, you will use NtDll, but for test, you will use a mock logging the calls
create a wrapper for NtDll class, with a interface. You can pass object of this type to the constructor in the real code using this class, and mock in test.
try to refactor your class using NtDll class into a single class.

All approaches assumes that you are using dependency injection.

Is there a library available which easily can record and replay results of API calls?

4 Answers4

Abstract