4

Is there a "Nt" or similar (i.e. non-kernelmode-driver) function equivalent for KeQueryInterruptTime or anything similar? There seems to be no such thing as NtQueryInterruptTime, at least I've not found it.

What I want is some kind of reasonably accurate and reliable, monotonic timer (thus not QPC) which is reasonably efficient and doesn't have surprises as an overflowing 32-bit counter, and no unnecessary "smartness", no time zones, or complicated structures.

So ideally, I want something like timeGetTime with a 64 bit value. It doesn't even have to be the same timer.
There exists GetTickCount64 starting with Vista, which would be acceptable as such, but I'd not like to break XP support only for such a stupid reason.

Reading the quadword at 0x7FFE0008 as indicated here ... well, works ... and it proves that indeed the actual internal counter is 64 bits under XP (it's also as fast as it could possibly get), but meh... let's not talk about what a kind of nasty hack it is to read some unknown, hardcoded memory location.

There must certainly be something in between calling an artificially stupefied (scaling a 64 bit counter down to 32 bits) high-level API function and reading a raw memory address?

Damon
  • 67,688
  • 20
  • 135
  • 185
  • GetTickCount() overflows once in ~49 days. You can implement GetTickCount64() on top of it, taking care of overflows, provided that you call it (or GetTickCount()) at least once in 24 days to catch the overflow. You may call it periodically either by using a timer event or in a thread that sleeps 24 days between the calls... – Alexey Frunze Nov 21 '11 at 12:45
  • @Alex: Certainly one can detect the overflow, although not quite as trivially. You really have to compare _every single result_ to the previous one (which must not be older than 49 days). Otherwise you cannot be sure that between now and next second, no overflow has happened. Unluckily, it is quite possible that someone leaves the computer up and running for 49 days, 17 hours before starting the program (unlikely, but possible). Also, it's needlessly complicated for something that is already present as a 64 bit number inside the kernel -- but the question is how to get at it. – Damon Nov 21 '11 at 15:07
  • You want QueryUnbiasedInterruptTime. Unfortunately it was only introduced in Windows 7. The fact that it exists, however, strongly suggests that previous versions of Windows had no supported equivalent, so I think you'll have to hold your nose and write a wrapper function for GetTickCount. – Harry Johnston Nov 22 '11 at 01:24
  • @Harry Johnston: That's about as close to what I want (_plus_ documented, _minus_ Win7 only). Unluckily I can't seem to find a way to get to any such thing (undocumented, if you will) in any version prior to Vista. That said, I've spent some time on `NtQuerySystemTime` in the mean time, which would be an acceptable alternative (monotone 64 bit value starting somewhere 1600 AD), but it shows some really weird behaviour on my machine (reproducably segfaulting after being called a few times) which I can't seem to explain, nor fix. – Damon Nov 23 '11 at 12:44
  • I don't think NtQuerySystemTime is monotonic; it will reflect changes to the system time made by the user or by the time synchronization service. If this isn't a problem, just use GetSystemTimeAsFileTime. – Harry Johnston Nov 23 '11 at 19:37
  • Did you see the last paragraph here? http://msdn.microsoft.com/en-us/library/windows/desktop/ee662306%28v=VS.85%29.aspx – Harry Johnston Nov 23 '11 at 19:55
  • Saw that, thank you. Registry and performance data helper functions are things I'd rather stay far away though. I don't trust anything with "registry" in the same sentence even as far as I can spit :) Also, (I may be totally wrong, though) I believe that availability of performance data requires a service that users can disable. Guess I'll have to settle with a 32-to-60 bit counter hack. – Damon Nov 24 '11 at 11:38
  • Is there a "Nt" or similar ...? Yes, there is: `NTSTATUS NtQueryTimerResolution(OUT PULONGMinimumResolution, OUT LONGMaximumResolution, OUT PULONGActualResolution);`. More information [here](http://stackoverflow.com/questions/7685762/windows-7-timing-functions-how-to-use-getsystemtimeadjustment-correctly/7691490#comment15650501_7691490) and [here](http://www.windowstimestamp.com/description). – Arno Aug 03 '12 at 10:26

3 Answers3

3

Here's an example of a thread-safe wrapper for GetTickCount() extending the tick count value to 64 bits and in that being equivalent to GetTickCount64().

To avoid undesired counter roll overs, make sure to call this function a few times every 49.7 days. You can even have a dedicated thread whose only purpose would be to call this function and then sleep some 20 days in an infinite loop.

ULONGLONG MyGetTickCount64(void)
{
  static volatile LONGLONG Count = 0;
  LONGLONG curCount1, curCount2;
  LONGLONG tmp;

  curCount1 = InterlockedCompareExchange64(&Count, 0, 0);

  curCount2 = curCount1 & 0xFFFFFFFF00000000;
  curCount2 |= GetTickCount();

  if ((ULONG)curCount2 < (ULONG)curCount1)
  {
    curCount2 += 0x100000000;
  }

  tmp = InterlockedCompareExchange64(&Count, curCount2, curCount1);

  if (tmp == curCount1)
  {
    return curCount2;
  }
  else
  {
    return tmp;
  }
}

EDIT: And here's a complete application that tests MyGetTickCount64().

// Compiled with Open Watcom C 1.9: wcl386.exe /we /wx /q gettick.c

#include <windows.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>

//
// The below code is an ugly implementation of InterlockedCompareExchange64()
// that is apparently missing in Open Watcom C 1.9.
// It must work with MSVC++ too, however.
//
UINT8 Cmpxchg8bData[] =
{
  0x55,             // push      ebp
  0x89, 0xE5,       // mov       ebp, esp
  0x57,             // push      edi
  0x51,             // push      ecx
  0x53,             // push      ebx
  0x8B, 0x7D, 0x10, // mov       edi, [ebp + 0x10]
  0x8B, 0x07,       // mov       eax, [edi]
  0x8B, 0x57, 0x04, // mov       edx, [edi + 0x4]
  0x8B, 0x7D, 0x0C, // mov       edi, [ebp + 0xc]
  0x8B, 0x1F,       // mov       ebx, [edi]
  0x8B, 0x4F, 0x04, // mov       ecx, [edi + 0x4]
  0x8B, 0x7D, 0x08, // mov       edi, [ebp + 0x8]
  0xF0,             // lock:
  0x0F, 0xC7, 0x0F, // cmpxchg8b [edi]
  0x5B,             // pop       ebx
  0x59,             // pop       ecx
  0x5F,             // pop       edi
  0x5D,             // pop       ebp
  0xC3              // ret
};

LONGLONG (__cdecl *Cmpxchg8b)(LONGLONG volatile* Dest, LONGLONG* Exch, LONGLONG* Comp) =
  (LONGLONG (__cdecl *)(LONGLONG volatile*, LONGLONG*, LONGLONG*))Cmpxchg8bData;

LONGLONG MyInterlockedCompareExchange64(LONGLONG volatile* Destination,
                                        LONGLONG Exchange,
                                        LONGLONG Comparand)
{
  return Cmpxchg8b(Destination, &Exchange, &Comparand);
}

#ifdef InterlockedCompareExchange64
#undef InterlockedCompareExchange64
#endif

#define InterlockedCompareExchange64(Destination, Exchange, Comparand) \
  MyInterlockedCompareExchange64(Destination, Exchange, Comparand)

//
// This stuff makes a thread-safe printf().
// We don't want characters output by one thread to be mixed
// with characters output by another. We want printf() to be
// "atomic".
// We use a critical section around vprintf() to achieve "atomicity".
//
static CRITICAL_SECTION PrintfCriticalSection;

int ts_printf(const char* Format, ...)
{
  int count;
  va_list ap;

  EnterCriticalSection(&PrintfCriticalSection);

  va_start(ap, Format);
  count = vprintf(Format, ap);
  va_end(ap);

  LeaveCriticalSection(&PrintfCriticalSection);

  return count;
}

#define TICK_COUNT_10MS_INCREMENT 0x800000

//
// This is the simulated tick counter.
// Its low 32 bits are going to be returned by
// our, simulated, GetTickCount().
//
// TICK_COUNT_10MS_INCREMENT is what the counter is
// incremented by every time. The value is so chosen
// that the counter quickly overflows in its
// low 32 bits.
//
static volatile LONGLONG SimulatedTickCount = 0;

//
// This is our simulated 32-bit GetTickCount()
// that returns a count that often overflows.
//
ULONG SimulatedGetTickCount(void)
{
  return (ULONG)SimulatedTickCount;
}

//
// This thread function will increment the simulated tick counter
// whose value's low 32 bits we'll be reading in SimulatedGetTickCount().
//
DWORD WINAPI SimulatedTickThread(LPVOID lpParameter)
{
  UNREFERENCED_PARAMETER(lpParameter);

  for (;;)
  {
    LONGLONG c;

    Sleep(10);

    // Get the counter value, add TICK_COUNT_10MS_INCREMENT to it and
    // store the result back.
    c = InterlockedCompareExchange64(&SimulatedTickCount, 0, 0);
    InterlockedCompareExchange64(&SimulatedTickCount, c + TICK_COUNT_10MS_INCREMENT, c) != c);
  }

  return 0;
}

volatile LONG CountOfObserved32bitOverflows = 0;
volatile LONG CountOfObservedUpdateRaces = 0;

//
// This prints statistics that includes the true 64-bit value of
// SimulatedTickCount that we can't get from SimulatedGetTickCount() as it
// returns only its lower 32 bits.
//
// The stats also include:
// - the number of times that MyGetTickCount64() observes an overflow of
//   SimulatedGetTickCount()
// - the number of times MyGetTickCount64() fails to update its internal
//   counter because of a concurrent update in another thread.
//
void PrintStats(void)
{
  LONGLONG true64bitCounter = InterlockedCompareExchange64(&SimulatedTickCount, 0, 0);

  ts_printf("  0x%08X`%08X <- true 64-bit count; ovfs: ~%d; races: %d\n",
            (ULONG)(true64bitCounter >> 32),
            (ULONG)true64bitCounter,
            CountOfObserved32bitOverflows,
            CountOfObservedUpdateRaces);
}

//
// This is our poor man's implementation of GetTickCount64()
// on top of GetTickCount().
//
// It's thread safe.
//
// When used with actual GetTickCount() instead of SimulatedGetTickCount()
// it must be called at least a few times in 49.7 days to ensure that
// it doesn't miss any overflows in GetTickCount()'s return value.
//
ULONGLONG MyGetTickCount64(void)
{
  static volatile LONGLONG Count = 0;
  LONGLONG curCount1, curCount2;
  LONGLONG tmp;

  curCount1 = InterlockedCompareExchange64(&Count, 0, 0);

  curCount2 = curCount1 & 0xFFFFFFFF00000000;
  curCount2 |= SimulatedGetTickCount();

  if ((ULONG)curCount2 < (ULONG)curCount1)
  {
    curCount2 += 0x100000000;

    InterlockedIncrement(&CountOfObserved32bitOverflows);
  }

  tmp = InterlockedCompareExchange64(&Count, curCount2, curCount1);

  if (tmp != curCount1)
  {
    curCount2 = tmp;

    InterlockedIncrement(&CountOfObservedUpdateRaces);
  }

  return curCount2;
}

//
// This is an error counter. If a thread that uses MyGetTickCount64() notices
// any problem with what MyGetTickCount64() returns, it bumps up this error
// counter and stops. If one of threads sees a non-zero value in this
// counter due to an error in another thread, it stops as well.
//
volatile LONG Error = 0;

//
// This is a thread function that will be using MyGetTickCount64(),
// validating its return value and printing some stats once in a while.
//
// This function is meant to execute concurrently in multiple threads
// to create race conditions inside of MyGetTickCount64() and test it.
//
DWORD WINAPI TickUserThread(LPVOID lpParameter)
{
  DWORD user = (DWORD)lpParameter; // thread number
  ULONGLONG ticks[4];

  ticks[3] = ticks[2] = ticks[1] = MyGetTickCount64();

  while (!Error)
  {
    ticks[0] = ticks[1];
    ticks[1] = MyGetTickCount64();

    // Every ~100 ms sleep a little (slightly lowers CPU load, to about 90%)
    if (ticks[1] > ticks[2] + TICK_COUNT_10MS_INCREMENT * 10L)
    {
      ticks[2] = ticks[1];
      Sleep(1 + rand() % 20);
    }

    // Every ~1000 ms print the last value from MyGetTickCount64().
    // Thread 1 also prints stats here.
    if (ticks[1] > ticks[3] + TICK_COUNT_10MS_INCREMENT * 100L)
    {
      ticks[3] = ticks[1];
      ts_printf("%u:0x%08X`%08X\n", user, (ULONG)(ticks[1] >> 32), (ULONG)ticks[1]);

      if (user == 1)
      {
        PrintStats();
      }
    }

    if (ticks[0] > ticks[1])
    {
      ts_printf("%u:Non-monotonic tick counts: 0x%016llX > 0x%016llX!\n",
                user,
                ticks[0],
                ticks[1]);
      PrintStats();
      InterlockedIncrement(&Error);
      return -1;
    }
    else if (ticks[0] + 0x100000000 <= ticks[1])
    {
      ts_printf("%u:Too big tick count jump: 0x%016llX -> 0x%016llX!\n",
                user,
                ticks[0],
                ticks[1]);
      PrintStats();
      InterlockedIncrement(&Error);
      return -1;
    }

    Sleep(0); // be nice, yield to other threads.
  }

  return 0;
}

//
// This prints stats upon Ctrl+C and terminates the program.
//
BOOL WINAPI ConsoleEventHandler(DWORD Event)
{
  if (Event == CTRL_C_EVENT)
  {
    PrintStats();
  }

  return FALSE;
}

int main(void)
{
  HANDLE simulatedTickThreadHandle;
  HANDLE tickUserThreadHandle;
  DWORD dummy;

  // This is for the missing InterlockedCompareExchange64() workaround.
  VirtualProtect(Cmpxchg8bData, sizeof(Cmpxchg8bData), PAGE_EXECUTE_READWRITE, &dummy);

  InitializeCriticalSection(&PrintfCriticalSection);

  if (!SetConsoleCtrlHandler(&ConsoleEventHandler, TRUE))
  {
    ts_printf("SetConsoleCtrlHandler(&ConsoleEventHandler) failed with error 0x%X\n", GetLastError());
    return -1;
  }

  // Start the tick simulator thread.

  simulatedTickThreadHandle = CreateThread(NULL, 0, &SimulatedTickThread, NULL, 0, NULL);

  if (simulatedTickThreadHandle == NULL)
  {
    ts_printf("CreateThread(&SimulatedTickThread) failed with error 0x%X\n", GetLastError());
    return -1;
  }

  // Start one thread that'll be using MyGetTickCount64().

  tickUserThreadHandle = CreateThread(NULL, 0, &TickUserThread, (LPVOID)2, 0, NULL);
  if (tickUserThreadHandle == NULL)
  {
    ts_printf("CreateThread(&TickUserThread) failed with error 0x%X\n", GetLastError());
    return -1;
  }

  // The other thread using MyGetTickCount64() will be the main thread.

  TickUserThread((LPVOID)1);

  //
  // The app terminates upon any error condition detected in TickUserThread()
  // in any of the threads or by Ctrl+C.
  //

  return 0;
}

As a test I've been running this test app under Windows XP for 5+ hours on an otherwise idle machine that has 2 CPUs (idle, to avoid potential long starvation times and therefore avoid missing counter overflows that occur every 5 seconds) and it's still doing well.

Here's the latest output from the console:

2:0x00000E1B`C8800000
1:0x00000E1B`FA800000
  0x00000E1B`FA800000 <- true 64-bit count; ovfs: ~3824; races: 110858

As you can see, MyGetTickCount64() has observed 3824 32-bit overflows and failed to update the value of Count with its second InterlockedCompareExchange64() 110858 times. So, overflows indeed occur and the last number means that the variable is, in fact, being concurrently updated by the two threads.

You can also see that the 64-bit tick counts that the two threads receive from MyGetTickCount64() in TickUserThread() don't have anything missing in the top 32 bits and are pretty close to the actual 64-bit tick count in SimulatedTickCount, whose 32 low bits are returned by SimulatedGetTickCount(). 0x00000E1BC8800000 is visually behind 0x00000E1BFA800000 due to thread scheduling and infrequent stat prints, it's behind by exactly 100*TICK_COUNT_10MS_INCREMENT, or 1 second. Internally, of course, the difference is much smaller.

Now, on availability of InterlockedCompareExchange64()... It's a bit odd that it's officially available since Windows Vista and Windows Server 2003. Server 2003 is in fact build from the same code base as Windows XP.

But the most important thing here is that this function is built on top of the Pentium CMPXCHG8B instruction that's been available since 1998 or earlier (1), (2). And I can see this instruction in my Windows XP's (SP3) binaries. It's in ntkrnlpa.exe/ntoskrnl.exe (the kernel) and ntdll.dll (the DLL that exports kernel's NtXxxx() functions that everything's built upon). Look for a byte sequence of 0xF0, 0x0F, 0xC7 and disassemble the code around that place to see that these bytes aren't there coincidentally.

You can check availability of this instruction through the CPUID instruction (EDX bit 8 of CPUID function 0x00000001 and function 0x80000001) and refuse to run instead of crashing if the instruction isn't there, but these days you're unlikely to find a machine that doesn't support this instruction. If you do, it won't be a good machine for Windows XP and probably your application as well anyways.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Why the loop? It seems unnecessary. – Harry Johnston Nov 23 '11 at 00:18
  • @HarryJohnston: you're right, if tmp!=curCount1, we can return tmp. – Alexey Frunze Nov 23 '11 at 00:20
  • Thank you for the attempt, but... _"Here's an example of a thread-safe wrapper"_ -- this function uses atomic functions, but it is __not__ threadsafe, or re-entrant for that matter. In between the two compare-exchanges of which the first clears the counter, no other thread had better call this function. Also, it uses the Vista-only `InterlockedCompareExchange64`function, which means you could as well call `GetTickCount64` in the first place, which has no re-entrancy issues. – Damon Nov 23 '11 at 11:54
  • @Damon: your conclusions are wrong. I'll update my answer with a test program that clearly demonstrates that this function is thread-safe in that it can be safely called from multiple threads without any effect on the correctness of its operation. And the first compare-exchange writes 0 to the counter if and only if the counter is 0, hence it doesn't modify it effectively. That's the beauty of compare-exchange. – Alexey Frunze Nov 23 '11 at 12:06
  • @Damon: see the updated answer. It appears that I'm not the only "crazy" person to read and increment 64-bit values atomically with CMPXCHG8B in 32-bit mode. Another person's doing it here: [64-bit atomic reads and writes on x86](http://www.niallryan.com/node/137). – Alexey Frunze Nov 23 '11 at 20:18
  • @Alex: for comparison, I've posted a variant which uses only 32-bit interlocks. It's a trifle less efficient, but more portable. – Harry Johnston Nov 23 '11 at 23:07
  • @Alex: I'll accept your answer (even though it's strictly not what I really wanted) because of the considerable effort you made in writing and testing it, because it probably gets as close to "something that kind of works" as it could with the given constraints, and because I stole a bit of the logic in your code for writing my own implementation (based on std::atomic, threaded, slightly less strict on correctness but still correct enough as to not provide wrong results, and templatized to use either counter/timer that the OS provides). Thank you for your effort. – Damon Nov 24 '11 at 14:25
1

Here's another approach, a variant of Alex's wrapper but using only 32-bit interlocks. It only actually returns a 60-bit number, but that's still good for about thirty-six million years. :-)

It does need to be called more often, at least once every three days. That shouldn't normally be a major drawback.

ULONGLONG MyTickCount64(void)
{
    static volatile DWORD count = 0xFFFFFFFF;
    DWORD previous_count, current_tick32, previous_count_zone, current_tick32_zone;
    ULONGLONG current_tick64;

    previous_count = InterlockedCompareExchange(&count, 0, 0);
    current_tick32 = GetTickCount();

    if (previous_count == 0xFFFFFFFF)
    {
        // count has never been written
        DWORD initial_count;
        initial_count = current_tick32 >> 28;
        previous_count = InterlockedCompareExchange(&count, initial_count, 0xFFFFFFFF);

        if (previous_count == 0xFFFFFFFF)
        {   // This thread wrote the initial value for count
            previous_count = initial_count;
        }
        else if (previous_count != initial_count)
        {   // Another thread wrote the initial value for count,
            // and it differs from the one we calculated
            current_tick32 = GetTickCount();
        }
    }

    previous_count_zone = previous_count & 15;
    current_tick32_zone = current_tick32 >> 28;

    if (current_tick32_zone == previous_count_zone)
    {
        // The top four bits of the 32-bit tick count haven't changed since count was last written.
        current_tick64 = previous_count;
        current_tick64 <<= 28;
        current_tick64 += current_tick32 & 0x0FFFFFFF;
        return current_tick64;
    }

    if (current_tick32_zone == previous_count_zone + 1 || (current_tick32_zone == 0 && previous_count_zone == 15))
    {
        // The top four bits of the 32-bit tick count have been incremented since count was last written.
        InterlockedCompareExchange(&count, previous_count + 1, previous_count);
        current_tick64 = previous_count + 1;
        current_tick64 <<= 28;
        current_tick64 += current_tick32 & 0x0FFFFFFF;
        return current_tick64;
    }

    // Oops, we weren't called often enough, we're stuck
    return 0xFFFFFFFF;
}
Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
  • If I just plug it into my test app, I get this printed: `1:0x00000001'32800000 / 2:0x00000001'32800000 / 0x00000000'32800000 <- true 64-bit count...`. Something's not quite right. Or is the count expected to start at 4G+? – Alexey Frunze Nov 23 '11 at 23:16
  • @Alex: yes, I used 0 to mean that the static variable was uninitialized, so the upper 32 bits of the count had to start at 1. I've updated the code so the static variable is 0xFFFFFFFF when uninitialized, so now the count starts at zero as expected. – Harry Johnston Nov 24 '11 at 02:41
1

Thanks to Google Books which kindly offered the relevant literature for free, I came up with an easy and fast implementation of GetTickCount64 which works perfectly well on pre-Vista systems too (and it still is somewhat less nasty than reading a value from a hardcoded memory address).

It is in fact as easy as calling interrupt 0x2A, which maps to KiGetTickCount. In GCC inline assembly, this gives:

static __inline__ __attribute__((always_inline)) unsigned long long get_tick_count64()
{
    unsigned long long ret;
    __asm__ __volatile__ ("int $0x2a" : "=A"(ret) : : );
    return ret;
}

Due to the way KiGetTickCount works, the function should probably better be called GetTickCount46, as it performs a right shift by 18, returning 46 bits, not 64. Though the same is true for the original Vista version, too.

Note that KiGetTickCount clobbers edx, this is relevant if you plan to implement your own faster implementation of the 32-bit version (must add edx to the clobber list in that case!).

Damon
  • 67,688
  • 20
  • 135
  • 185