1

I have a question regarding PAPI (Performance Application Programming Interface). I downloaded and installed PAPI library. Still not sure how to use it correctly and what additional things I need, to make it work. I am trying to use it in C. I have this simple program:

 int retval;

  retval = PAPI_library_init(PAPI_VER_CURRENT);


  if (retval != PAPI_VER_CURRENT && retval > 0) {
    printf("PAPI error: 1\n");
    exit(1); 
}

  if (retval < 0)
    printf("PAPI error: 2\n");


  retval = PAPI_is_initialized();


  if (retval != PAPI_LOW_LEVEL_INITED)
    printf("PAPI error: 2\n");


  int num_hwcntrs = 0;

  if ((num_hwcntrs = PAPI_num_counters()) <= PAPI_OK)
     printf("This system has %d available counters. \n", num_hwcntrs);

I have included papi.h library and I am compiling with gcc -lpapi flag. I added library in path so it is able to compile and run, but as a result I get this:

This system has 0 available counters.

Thought initialization seems to work as it doesn't give error code. Any advice or suggestion would be helpful to determine what I have not done right or missed to run it correctly. I mean, I should have available counters in my system, more precisely I need cache miss and cache hit counters.

I tried to count available counters after I run this another simple program and it gave error code -25:

int numEvents = 2;
  long long values[2];
  int events[2] = {PAPI_L3_TCA,PAPI_L3_TCM};

  printf("PAPI error: %d\n", PAPI_start_counters(events, numEvents));

UPDATE: I just tried to check from terminal hardware information with command: papi_avail | more; and I got this:

Available PAPI preset and user defined events plus hardware information.


PAPI version : 5.7.0.0

Operating system : Linux 4.15.0-45-generic

Vendor string and code : GenuineIntel (1, 0x1)

Model string and code : Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz (78, 0x4e)

CPU revision : 3.000000

CPUID : Family/Model/Stepping 6/78/3, 0x06/0x4e/0x03

CPU Max MHz : 2800

CPU Min MHz : 400

Total cores : 4

SMT threads per core : 2

Cores per socket : 2

Sockets : 1

Cores per NUMA region : 4

NUMA regions : 1

Running in a VM : no

Number Hardware Counters : 0

Max Multiplex Counters : 384

Fast counter read (rdpmc): no

PAPI Preset Events

Name        Code    Avail Deriv Description (Note)

PAPI_L1_DCM 0x80000000 No No Level 1 data cache misses

PAPI_L1_ICM 0x80000001 No No Level 1 instruction cache misses

PAPI_L2_DCM 0x80000002 No No Level 2 data cache misses

PAPI_L2_ICM 0x80000003 No No Level 2 instruction cache misses .......

So because Number Hardware Counters is 0, I can't use this tool to count cache misses with PAPI's preset events? Is there any configuration that can be useful or should I forget about it till I change my laptop?

Ana Khorguani
  • 896
  • 4
  • 18
  • 2
    Your Skylake CPU has 4 programmable counters per hyperthread, or 8 per single core. As well as fixed counters for core clock cycles and instructions. One of the counters might be tied up as an NMI watchdog which you can disable with sysctl `kernel/nmi_watchdog = 0` – Peter Cordes Feb 09 '19 at 03:02
  • 1
    Were you able to run the PAPI test `make test` successfully? – Hadi Brais Feb 09 '19 at 05:56
  • @HadiBrais Yes I managed to test them. I got some errors, successes and fails as well but I managed to fix the problem of Hardware counters and now seems to be working – Ana Khorguani Feb 09 '19 at 20:25
  • @PeterCordes Thank you very much, it worked. However if it's not too much to ask: I set nmi_wathcdog to 0, and I got 11 counters for hardware performance. When I set it back to 1 (before changing I checked and it was 1) I was left with 10 counters. So my guess is nmi_watchdog only needs 1 counter. Then why was all the rest 10 also hidden or included for this nmi_watchdog? I read about it and if I correctly understand it's for non-maskable interrupts, that can free CPU if it gets locked up or halted. – Ana Khorguani Feb 09 '19 at 20:29
  • That's weird, IDK what that would make so much difference. The NMI watchdog presumably works by programming one performance counter to trigger an interrupt infrequently, maybe using reference cycles or something that will count any time the CPU isn't sleeping. (Unlike the real TSC that RDTSC reads, the perf event for reference cycles [doesn't count when the clock is stopped](https://stackoverflow.com/questions/45472147/)). Anyway, according to perf documentation, it should only tie up *one* programmable counter (on every core). – Peter Cordes Feb 09 '19 at 20:34
  • Yes that's why I was surprised that after changing it, 11 hardware counters showed up all of a sudden, even though nmi_watchdog seems to only use 1. Anyway thank you a lot. – Ana Khorguani Feb 09 '19 at 21:01
  • 1
    @PeterCordes I just found that it's up to perf_event_paranoid this value, if it's set to 3 than I don't see any of my hardware counters, as user does not have privileges. When I set it -1 again than I got my hardware counters. Makes more sense this way :) – Ana Khorguani Feb 11 '19 at 09:34
  • This is a known problem in HPCs please, see this [no counters are available](https://superuser.com/questions/980632/run-perf-without-root-rights) – husin alhaj ahmade Apr 11 '19 at 13:26

0 Answers0