3

Is there a way in C++ on windows to measure time in nanoseconds?

All i can find are linux solutions.

bames53
  • 86,085
  • 15
  • 179
  • 244
user997112
  • 29,025
  • 43
  • 182
  • 361
  • 1
    See [Boost.Chrono](http://www.boost.org/libs/chrono/). – ildjarn Apr 04 '12 at 21:59
  • [QueryPerformanceCounter](http://msdn.microsoft.com/en-us/library/windows/desktop/ms644904(v=vs.85).aspx) is Windows, though Boost is as good and is also portable. – Mooing Duck Apr 04 '12 at 21:59
  • You can not accurately measure execution time on most systems beyond units of seconds. – AJG85 Apr 04 '12 at 22:23
  • @AJG85 On Windows you can get down to ~10ns resolution in WinNT – Mooing Duck Apr 04 '12 at 22:26
  • 1
    If you're using VS11 you should use the chrono library, and you should go and upvote [this](http://connect.microsoft.com/VisualStudio/feedback/details/719443/c-chrono-headers-high-resolution-clock-does-not-have-high-resolution#details) issue on MS connect. – bames53 Apr 04 '12 at 22:26
  • 2
    I highly doubt that you need nanoseconds. Thats mostly like writing down the results of a physical experiment with 20 digits. If you use nanoseconds you have to watch out for every memory access because a full cache miss can add 30ns just for a memory access if you total random. – Lothar Apr 04 '12 at 22:38
  • The [Fastest timing resolution system][1] thread at SO discusses this matter as well. [1]: http://stackoverflow.com/questions/3162826/fastest-timing-resolution-system/11474459#11474459 – Arno Jul 31 '12 at 14:00

4 Answers4

6

Use the QueryPerformanceFrequency function to see what speed the QueryPerformanceCounter runs at. I think it might be in the nanosecond range.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
4

Look into QueryPerformanceCounter on windows.

When timing code to identify performance bottlenecks, you want to use the highest resolution timer the system has to offer. This article describes how to use the QueryPerformanceCounter function to time application code

http://support.microsoft.com/kb/172338

Dan P
  • 1,939
  • 3
  • 17
  • 30
  • 1
    @Mehrdad: Because when I read this, I have made no progress towards understanding the answer to the question. This is a link. It should have _at least_ a summary. – Mooing Duck Apr 04 '12 at 22:08
  • 3
    @MooingDuck: That just makes it a small answer. You *do* have progress because now you have a resource to go to and read. I would fully expect readers of answers to be able to click a link and read. Summary might be *nice*, but it's not necessary, and certainly doesn't make this a comment. (What's not okay is a single link to something that might later die.) – GManNickG Apr 04 '12 at 22:11
  • Must not have clicked on the link or even bothered to read what it was about. Here is a summary. When timing code to identify performance bottlenecks, you want to use the highest resolution timer the system has to offer. This article describes how to use the QueryPerformanceCounter function to time application code. – Dan P Apr 04 '12 at 22:16
  • 1
    @DanP: That right there should have been part of the answer. Being "correct" does not mean something is a "good answer". Links are great, but they are not everything. – Mooing Duck Apr 04 '12 at 22:19
  • @MooingDuck: I think clicking links and reading documentation is a skill which this answer (correctly) assumed, and/or tried to teach. – user541686 Apr 04 '12 at 22:31
1

If you can run your own assembly, you could read the CPU's cycle counter and divide a cycle difference it by the CPU's clock rate:

static inline uint64_t get_cycles()
{
  uint64_t t;
  __asm__ __volatile__ ("rdtsc" : "=A"(t));
  return t;
}
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • IIRC, there's might be a gotcha with this... it's either only available on newer CPUs, or it's only available in kernel mode, or something like that... – user541686 Apr 04 '12 at 22:02
  • @Mehrdad: `rdtsc` has been available since P6-family CPUs, possibly even the original Pentium. It can be restricted to kernel mode, but i don't know if Windows does that. – cHao Apr 04 '12 at 22:03
  • @Mehrdad: Nope, it's not restricted to kernel mode on Windows. If you use VC the syntax is `long long ticks() { __asm {rdtsc}; } ` And if by newer CPUs you mean Pentium, yeah, then it's only available on "newer" CPUs. Personally, though, it's been a while since I coded for the 486 and earlier. – Andreas Magnusson Apr 04 '12 at 22:21
  • There *is* however an issue on multi cores, since the value returned will not be syncronized between CPUs or CPU-cores. But if you only run on a single core/CPU, you'll be fine. – Andreas Magnusson Apr 04 '12 at 22:29
  • @Mehrdad: You'll also need to get the CPU ID and somehow manage a global association of CPU ID and tick counter, but of course if you want to time an operation that gets moved across CPUs you're in trouble. A good OS shouldn't do that, though, since it wouldn't want to spoil the hot cache. – Kerrek SB Apr 04 '12 at 22:40
1

Use Windows7 and the Hardware Counter Profiling API http://msdn.microsoft.com/en-us/library/windows/desktop/dd796395(v=vs.85).aspx

Both rdtsc and QueryPerformanceCounter/QueryPerformanceFrequency are not accurate enough because of the large overhead, interrupts and task switches.

[EDIT]: Sorry mixed up the link for PerformanceCounter with Hardware Counters. Sorry have used it only once and this was a quick answer.

Lothar
  • 12,537
  • 6
  • 72
  • 121
  • How does one use that? I can't figure it out. – Mooing Duck Apr 04 '12 at 22:04
  • And if you want compare the runtime for different code implementations calculate use the execution cycles not the absolute times. – Lothar Apr 04 '12 at 22:06
  • 1
    @GManNickG: Because there was no code, no description, and the link was (past tense) absolutely useless. – Mooing Duck Apr 04 '12 at 22:21
  • @Lothar: Actually the problem with both `rdtsc` and `QueryPerformanceCounter` (et al) (most likely `QPC` is implemented in terms of `rdtsc`) isn't in the overhead, it's in the synchronization across CPU-cores. Each core has its own time stamp counter and there are no syncronization between them. – Andreas Magnusson Apr 04 '12 at 22:38
  • The good with Hardware counters is also that you can use it to measure the L1 and L2 Cache miss rate. With nanoseconds this is important to keep in mind. – Lothar Apr 04 '12 at 22:39
  • Yes @Andreas, but if for example you just want to add a nanosecond timing around each function it is an overhead problem. QueryPerformanceCounter takes a few thousand clock cycles. The synchronisation between CPUs can be set by setting the thread/CPU affinity. Writing a small program setting the affinity of all other running processes/threads to other CPUs help a lot but it's still a hack. It's a 1996 (Windows 2000) state of API. – Lothar Apr 04 '12 at 22:43
  • @Lothar: Firstly, I'm not saying that anyone should (or shouldn't) use `rdtsc` (or `QPC`). It's a tool and as most tools it has pros and cons. It's up to the reader to weigh them and make a judgement. Secondly, calling `rdtsc` twice in a row takes 24 cycles (and that includes the necessary instructions to save the value from the first call), not so much of an overhead IMHO. Thirdly, last I checked W2K was released in 2000, you must be thinking of Windows NT4? – Andreas Magnusson Apr 04 '12 at 23:14