1

I am reading the code written by Lemire which makes a benchmark of number of CPU cycles and instructions.

The following is the main logic:

  #define N_CONFIG 2
  int CONFIGS[N_CONFIG] = {PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS};

  const int pid = 0;  // current process
  const int cpu = -1; // all cpus
  const unsigned long flags = 0;

  int group = -1; // no group
  for (uint32_t i = 0; i < N_CONFIG; i++) {
    attr.config = CONFIGS[i];
    fd =
        (int)syscall(__NR_perf_event_open, &event.attr, pid, cpu, group, flags);
    ioctl(fd, PERF_EVENT_IOC_ID, &ids[i]);
    if (group == -1) {
      group = fd;
    }
 }

Then how to read the number of CPU cycles and instructions from fd?

In end() method:


if (read(fd, temp_result_vec.data(), temp_result_vec.size() * 8) == -1) {
   report_error("read");
}
// our actual results are in slots 1,3,5, ... of this structure

How to understand "our actual results are in slots 1,3,5, ... of this structure"? This structure refers to std::vector<uint64_t> whose size is 2 * N_CONFIG + 1.


perf_event_open provides an example to obtain the number of instructions, and it is very easy to follow, in which only ONE config (i.e., PERF_COUNT_HW_INSTRUCTIONS) is used:

long long count;
read(fd, &count, sizeof(count));

But it seems that when it comes to multiple configs at the same time, code becomes weird.

chenzhongpu
  • 6,193
  • 8
  • 41
  • 79
  • 1
    There is an example that does exactly what you want in the manual page for `perf_event_open`: https://manned.org/perf_event_open.2 (look at the very bottom of the page). – Marco Bonelli Apr 21 '23 at 10:21
  • Thanks. I finally understand what it is going on for multiple monitor events. – chenzhongpu Apr 21 '23 at 13:54

1 Answers1

2

How to understand "our actual results are in slots 1,3,5, ... of this structure"?

Because this program is used to record two events (i.e., PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS), they need to be grouped, and the read format is specified as:

PERF_FORMAT_GROUP | PERF_FORMAT_ID

If PERF_FORMAT_GROUP is specified, then what returns from read is:

 struct read_format {
       u64 nr;            /* The number of events */
       struct {
            u64 value;     /* The value of the event */
            u64 id;        /* if PERF_FORMAT_ID */
        } values[nr];
 };

Therefore, as for two events here, the first position (i.e., slot 0) is nr; the second position is a value (i.e., slot 1); the third position is an ID (i.e., slot 2); the forth position is a value (i.e., slot 3); the fifth position is an ID (i.e., slot 4). That's why actual results are in slots 1 and 3.

A related question can be found at perf_event_open - how to monitoring multiple events.


BTW, there is a bug in the code. It wrongly updates fd in each iteration at line 46. Instead, we should record the first fd (when i == 0) as the group leader, and later we must use such fd to read statistics.

chenzhongpu
  • 6,193
  • 8
  • 41
  • 79