4

I've been trying to measure/monitor the utilization of all those 60 cores on Xeon Phi (Knights Corner, in-order processors) at a relatively high frequency, say, at least every 0.1s which yields to 10Hz.

I tried the latest PAPI library. But it only supports PAPI_TOT_INS which is the counter of completed instructions. This won't work because I actually need something related to the instructions issued every 0.1s, not finished. Several instructions issued at different cycles may finish at the same cycle. The issue of instructions is influenced by whether the core is halted or not.

Other commands available like 'top' and 'perf' operate at 1Hz which is too slow for my measurement. I need a higher frequency. And, I also need to synchronize the measurement with vital phases of my codes. So, the Intel Vtune Profile does not work for me either.

Is there a possible way for me to monitor the issue of instructions on Xeon Phi or any other activities linked to their utilizations? I understand that those hardware counters are there, but to read them seems very challenging to me. Maybe I can deduce this utilization by measuring the CPU time of each thread?

Thanks.

thierry
  • 217
  • 2
  • 12
  • Are you monitoring from the coprocessor or from the host? – Taylor Kidd Mar 18 '15 at 18:09
  • @TaylorKidd No, I am trying to do so natively on Xeon Phi. I currently use cpu_time/real_time as an approximation. – thierry Mar 19 '15 at 07:15
  • I don't have time to look now, but you might see if there is anything relevant in /proc (/sys/class on phi). Also, there is the [pfmon tool](http://perfmon2.sourceforge.net/). Even if there's no phi implementation, you might use it as a blue print and example. – Taylor Kidd Mar 19 '15 at 13:44
  • 1
    You can call `perf_event_open` directly from your code, program it in the way like `perf stat --per-core` does, and then use special perf's `ioctl`s to enable PMU counting for some part of your code (or read stats several times). Example of using perf_event_open: https://github.com/castl/easyperf per-core mode is AGGR_CORE http://lxr.free-electrons.com/source/tools/perf/builtin-stat.c#L427 and line 1502. Other easier way (without hw counters) - calling getrusage from your code. – osgx Mar 21 '15 at 08:55
  • @thierry - In 2015, `PAPI` wasn't supporting `KNL`. Newer [versions](http://icl.cs.utk.edu/papi/news/index.html) have added support for it. Were you able to get `per-core` utilization? My understanding is that `KNL` or `Xeon Phi` system kernel has only `performance` and `powersave` governor because of `intel_pstate`. For `per-core` utilization, most likely `ondemand` governor is required which isn't available by default. Can you please share details of how you got `per-core` utilization? I am on [Ninja Developer Platform](http://dap.xeonphi.com/#platformspecs). – Chetan Arvind Patil Jul 30 '17 at 23:54
  • @TaylorKidd - Do you have suggestions on above comment of mine? – Chetan Arvind Patil Jul 30 '17 at 23:57

0 Answers0