27

I keep hearing that it's very expensive to create a new process in Windows. But I can't find exact numbers. Is there a ballpark number of cycles? How many milliseconds on a 2GHz dual-core processor?

I wrote a test program in Python and measured 5ms per process, but I don't know how much of that is extra overhead from Python. I'm guessing not much.

japreiss
  • 11,111
  • 2
  • 40
  • 77
  • Cycles have not the same meaning on CPUs these days as they used to have. You'll have at least one transition into kernel mode, because both threads and processes are kernel objects. Other than that it will also depend on the version of Windows (considering Shims). – 0xC0000022L May 22 '12 at 22:22
  • Related discussion (but not an answer): http://stackoverflow.com/questions/47845/why-is-creating-a-new-process-more-expensive-on-windows-than-linux – assylias May 22 '12 at 22:24
  • You can get an idea of the scale of this by using Process Monitor (available from MS website) and watching a new process start up. There are *thousands* of file and registry operations taking place. – Harry Johnston May 22 '12 at 22:39
  • 1
    Because most of the time process creation on unixoid systems is `fork` followed by `exec`, which also has completely different semantics. It's the same reason Apache prefers thread on Windows, while `fork` is preferred on Linux. It's just the way the system was designed and one has to know ones tools. – 0xC0000022L May 22 '12 at 22:47
  • 3
    It's high, compared to Unix operating systems. Windows NT always supported threads from day one, Unix needed to implement multiprocessing with processes and acquired threads in an agreed-upon way around 1997. Having multi-processed long before that. The fork() call was/is core to spin off another process. With the optimization that it doesn't create a brand new process from scratch so can take advantage of existing virtual memory mapping. Windows creates a process entirely from scratch. – Hans Passant May 22 '12 at 23:20

1 Answers1

35

Interresting question!

As said before the overhead is high. Out of curiosity if've quickly written a little benchmark to get a number of thumb how long the creation of a thread and a process takes and how these times are related.

#include <windows.h>
#include <stdio.h>
#include <conio.h>

#define MIN   0
#define AVG   1
#define MAX   2

DWORD WINAPI thread(LPVOID lpvData)
{
    return (0);
}

int main()
{
    BOOL result;
    int iteration;
    int i;
    STARTUPINFO si;
    PROCESS_INFORMATION pi;
    DWORD tStart;
    DWORD tEllapsed;
    double tCall;
    int spawnCount;
    HANDLE hThread;
    DWORD threadId;
    double ratio;
    double statCreateProcess[3];
    double statCreateThread[3];


    for (iteration = 0; iteration < 16; iteration++)
    {
        /*
        **  Measure creation time of process
        */
        tEllapsed = 0;
        spawnCount = 0;
        for (i = 0; i < 100; i++)
        {
            ZeroMemory(&si, sizeof(si));
            si.cb = sizeof(si);
            ZeroMemory(&pi, sizeof(pi));

            tStart = GetTickCount();
            result = CreateProcess(NULL,
                                   "cmd.exe",
                                   NULL,
                                   NULL,
                                   FALSE,
                                   NORMAL_PRIORITY_CLASS,
                                   NULL,
                                   NULL,
                                   &si,
                                   &pi);

            if (result != FALSE)
            {
                tEllapsed += GetTickCount() - tStart;
                spawnCount++;

                // clean up...
                TerminateProcess(pi.hProcess, 0);
                CloseHandle(pi.hThread);
                CloseHandle(pi.hProcess);
            }
        }
        tCall = tEllapsed / (double)spawnCount;
        printf("average creation time of process: %0.3fms\n", tCall);

        // track statistics...
        if (iteration > 0)
        {
            if (statCreateProcess[MIN] > tCall)
                statCreateProcess[MIN] = tCall;
            statCreateProcess[AVG] += tCall;
            if (statCreateProcess[MAX] < tCall)
                statCreateProcess[MAX] = tCall;
        }
        else
        {
            statCreateProcess[MIN] = tCall;
            statCreateProcess[AVG] = tCall;
            statCreateProcess[MAX] = tCall;
        }


        /* measure creation time of thread */
        spawnCount = 0;
        tStart = GetTickCount();
        for (i = 0; i < 5000; i++)
        {           
            hThread = CreateThread(NULL,
                                   0,
                                   thread,
                                   NULL,
                                   0,
                                   &threadId);
            if (hThread != NULL)
            {
                spawnCount++;

                // clean up...
                CloseHandle(hThread);
            }
        }
        tEllapsed = GetTickCount() - tStart;
        tCall = tEllapsed / (double)spawnCount;
        printf("average creation time of thread: %0.3fms\n", tCall);

        // track statistics...
        if (iteration > 0)
        {
            if (statCreateThread[MIN] > tCall)
                statCreateThread[MIN] = tCall;
            statCreateThread[AVG] += tCall;
            if (statCreateThread[MAX] < tCall)
                statCreateThread[MAX] = tCall;
        }
        else
        {
            statCreateThread[MIN] = tCall;
            statCreateThread[AVG] = tCall;
            statCreateThread[MAX] = tCall;
        }
    } /* for (iteration = ...) */

    statCreateProcess[AVG] /= iteration;
    statCreateThread[AVG] /= iteration;

    printf("\n\n--- CreateProcess(..) ---\n");
    printf("minimum execution time ...: %0.3fms\n", statCreateProcess[MIN]);
    printf("average execution time ...: %0.3fms\n", statCreateProcess[AVG]);
    printf("maximum execution time ...: %0.3fms\n", statCreateProcess[MAX]);
    printf("\n--- CreateThread(..) ---\n");
    printf("minimum execution time ...: %0.3fms\n", statCreateThread[MIN]);
    printf("average execution time ...: %0.3fms\n", statCreateThread[AVG]);
    printf("maximum execution time ...: %0.3fms\n", statCreateThread[MAX]);

    ratio = statCreateProcess[AVG] / statCreateThread[AVG];
    printf("\n\nratio: %0.3f\n\n", ratio);

    getch();
    return (0);
}

I've made several runs on my computer (i5 3.2GHz; Windows 7) and the values are pretty consistent if the anti virus application is turned off and the benchmark is started from outside of Visual Studio:

--- CreateProcess(..) ---
minimum execution time ...: 11.860ms
average execution time ...: 12.756ms
maximum execution time ...: 14.980ms

--- CreateThread(..) ---
minimum execution time ...: 0.034ms
average execution time ...: 0.037ms
maximum execution time ...: 0.044ms


ratio: 342.565

As expected the variation of CreateProcess(..) is bigger since more system calls are involved and the likelyhood of being interrupted by another thread is higher. Remember that the time to create the thread is even shorter since the time measurement includes the whole control-loop (otherwise GetTickCount(..) would be too inaccurate to measure the time).

Another test on a virtual PC running Windows XP (running on the same machine as mentioned above) produced the following values:

--- CreateProcess(..) ---
minimum execution time ...: 22.630ms
average execution time ...: 24.666ms
maximum execution time ...: 27.340ms

--- CreateThread(..) ---
minimum execution time ...: 0.076ms
average execution time ...: 0.086ms
maximum execution time ...: 0.100ms


ratio: 287.982

Interrestingly the ratio of the average execution times of CreateProcess(..) and CreateThread(..) are pretty close.

It would be interresting to see values of other machines and versions of Windows. I would not be surprised if a ratio of about 300 is about the same on different machines and versions of Windows.

So let's conclude: CreateProcess(..) is much slower than CreateThread(..) on Windows. But actually I'm quite shocked how much slower it really is...

Lukas Thomsen
  • 3,089
  • 2
  • 17
  • 23
  • How much of that time is actual disk I/O overhead? What type of disk was in the test systems? How do test results compare, if you launch a truly bare-bones Windows executable that links only against kernel32.dll? – IInspectable Jun 08 '18 at 17:38
  • 2
    @IInspectable Well just try it. I guess disk I/O overhead is not that much since the executable was within the disk cache during my retries already. I wouldn't be surprised if the results aren't that much different. As you see I've tested this on my PC and a virtual PC and the ratio is remarkable close ... let me know your results. – Lukas Thomsen Jun 08 '18 at 20:19