I am reading the code written by Lemire which makes a benchmark of number of CPU cycles and instructions.
The following is the main logic:
#define N_CONFIG 2
int CONFIGS[N_CONFIG] = {PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS};
const int pid = 0; // current process
const int cpu = -1; // all cpus
const unsigned long flags = 0;
int group = -1; // no group
for (uint32_t i = 0; i < N_CONFIG; i++) {
attr.config = CONFIGS[i];
fd =
(int)syscall(__NR_perf_event_open, &event.attr, pid, cpu, group, flags);
ioctl(fd, PERF_EVENT_IOC_ID, &ids[i]);
if (group == -1) {
group = fd;
}
}
Then how to read the number of CPU cycles and instructions from fd
?
In end()
method:
if (read(fd, temp_result_vec.data(), temp_result_vec.size() * 8) == -1) {
report_error("read");
}
// our actual results are in slots 1,3,5, ... of this structure
How to understand "our actual results are in slots 1,3,5, ... of this structure"? This structure refers to std::vector<uint64_t>
whose size is 2 * N_CONFIG + 1
.
perf_event_open provides an example to obtain the number of instructions, and it is very easy to follow, in which only ONE config (i.e., PERF_COUNT_HW_INSTRUCTIONS
) is used:
long long count;
read(fd, &count, sizeof(count));
But it seems that when it comes to multiple configs at the same time, code becomes weird.