4

I am looking for a Linux alternative to the Windows high-resolution performance counter API, and the following API functions in particular:

Thanks.

kakush
  • 3,334
  • 14
  • 47
  • 68

3 Answers3

6

See clock_gettime() with CLOCK_MONOTONIC_RAW flag, and clock_getres().

Here is also an example of how to use it:

  • i still cant figure it out: when i try to use clock_gettime(clock_id, &tp); tp only returns the time elapsed in microseconds and seconds, and what i need is number of cycles (=tickes). – kakush Dec 19 '11 at 07:56
  • @user1087995: CPU cycles? Of what CPU though? Your program could be scheduled on different CPUs, unless you set CPU affinity explicitly. This way of measuring performance is nearly obsolete. But if you still want to go that way, take a look at RDTSC - http://en.wikipedia.org/wiki/Time_Stamp_Counter –  Dec 19 '11 at 18:26
  • But in order to get precision, you have to know CPU frequency in order to convert number of cycles into time. However, CPU frequency may be non-constant, i.e. Intel has Turbo Boost technology for energy saving etc. Put it this way - Windows is buggy. –  Dec 19 '11 at 18:37
  • @ScrollerBlaster: Thanks for the tip. I've provided a new link. Though now it is slightly C and also ported to OS X. –  Feb 29 '12 at 00:58
  • If you want to compile the `stopwatch` with C99 support you have to use gcc flag `-std=gnu99` instead of `-std=c99` for it to work. Just adding because I worked a hour on this :) – halex Sep 06 '12 at 07:34
2

The perf tool, which has been provided with the kernel for some time, now, probably answers your needs. It has a s*load of options, so study it carefully ;)

EDIT: forget it, I thought you were talking about CPU performance counters.

fge
  • 119,121
  • 33
  • 254
  • 329
  • thank you for the quick response. I need to use these functions in a cpp program. How can I do that? – kakush Dec 15 '11 at 16:08
  • @user1087995: `perf` is a profiling tool that gets you many metrics, but it is not even close to what `QueryPerformanceCounter` does on Windows. So `perf` is not an answer. `clock_gettime ()` is what gives you wall high-precision (hardware) wall time. Use that. –  Dec 15 '11 at 18:22
  • 3
    Hmm, OK, when I saw "performance counters" I really thought CPU performance counters were meant, but it appears this isn't the case here. – fge Dec 15 '11 at 19:38
0

Linux perf_event_open system call

This system call exposes several performance counters in an arch agnostic manner.

man perf_event_open documents the available counters, and it includes all the most basic things you'd expect:

  • cycle count (config = PERF_COUNT_HW_CPU_CYCLES)
  • cache hits and misses (type = PERF_TYPE_HW_CACHE)
  • branch misses (config = PERF_COUNT_HW_BRANCH_MISSES)
  • kernel software visible events like page faults (PERF_COUNT_SW_PAGE_FAULTS) and context switches (PERF_COUNT_SW_CONTEXT_SWITCHES)

I have given an for the cycle counts at: How to get the CPU cycle count in x86_64 from C++?

perf_event_open.c

#include <asm/unistd.h>
#include <linux/perf_event.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>

#include <inttypes.h>

static long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
                int cpu, int group_fd, unsigned long flags)
{
    int ret;

    ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
                    group_fd, flags);
    return ret;
}

int
main(int argc, char **argv)
{
    struct perf_event_attr pe;
    long long count;
    int fd;

    uint64_t n;
    if (argc > 1) {
        n = strtoll(argv[1], NULL, 0);
    } else {
        n = 10000;
    }

    memset(&pe, 0, sizeof(struct perf_event_attr));
    pe.type = PERF_TYPE_HARDWARE;
    pe.size = sizeof(struct perf_event_attr);
    pe.config = PERF_COUNT_HW_CPU_CYCLES;
    pe.disabled = 1;
    pe.exclude_kernel = 1;
    // Don't count hypervisor events.
    pe.exclude_hv = 1;

    fd = perf_event_open(&pe, 0, -1, -1, 0);
    if (fd == -1) {
        fprintf(stderr, "Error opening leader %llx\n", pe.config);
        exit(EXIT_FAILURE);
    }

    ioctl(fd, PERF_EVENT_IOC_RESET, 0);
    ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

    /* Loop n times, should be good enough for -O0. */
    __asm__ (
        "1:;\n"
        "sub $1, %[n];\n"
        "jne 1b;\n"
        : [n] "+r" (n)
        :
        :
    );

    ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
    read(fd, &count, sizeof(long long));

    printf("%lld\n", count);

    close(fd);
}
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985