1

I have been trying to log memory accesses that are made by a program using Perf and PEBS counters. My intention was to log all of the memory accesses made by a program (I chose programs from SpecCPU2006). By tweaking certain parameters, I seem to record much more samples than there actually is for the program. I know, as has been said previously, that it is tough to record all of the memory access samples but leaving that aside, I want to know how can PEBS record more samples than there actually is?

I followed the below steps :-

First of all, I modified the /proc/sys/kernel/perf_cpu_time_max_percent value. Initially it was 25%, I changed it to 95%. This was because I wanted to see if I can record the maximum number of memory access samples. This would also allow me to probably use a much higher perf_event_max_sample_rate, which is usually 100,000 at a maximum but now I can set it to a higher value without it being lowered down.

I used a much higher value for perf_event_max_sample_rate which is 244,500, instead of the maximum allowable value of 100,000.

Now what I did was I used perf-stat to record the total count of the memory-stores information in a program. I got the below data :-

./perf stat -e cpu/mem-stores/u ../../.././libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25

 Performance counter stats for '../../.././libquantum_base.arnab 100':

       158,115,509      cpu/mem-stores/u                                            

       0.591718162 seconds time elapsed

There are roughly ~158 million events as indicated by perf-stat, which should be a correct indicator, since this is directly coming from the hardware counter values.

But now, as I run the perf record -e command and use PEBS counters to calculate all of the memory store events that are possible :-

./perf record -e cpu/mem-stores/upp -c 1 ../../.././libquantum_base.arnab 100
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.

Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.

Samples in kernel modules won't be resolved at all.

If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.

Couldn't record kernel reference relocation symbol
Symbol resolution may be skewed if relocation was used (e.g. kexec).
Check /proc/kallsyms permission or run as root.
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
[ perf record: Woken up 32 times to write data ]
[ perf record: Captured and wrote 7.827 MB perf.data (254125 samples) ]

I can see 254125 samples being recorded. This is much much less than what was returned by perf stat. I am recording all of these accesses in the userspace only (I am using -u in both cases).

Why does this happen ? Am I recording the memory-store events in any wrong way ? Or is there a problem with the CPU behavior ?

Arnabjyoti Kalita
  • 2,325
  • 1
  • 18
  • 31
  • _158,115,509 cpu/mem-stores/u_ isn't this 158 **million** samples? – Shahbaz Jun 05 '17 at 20:36
  • 1
    I think perf stat only gives me a precise count of the events. It is 158 million **events** , **not samples**, because it directly comes from the hardware counter. – Arnabjyoti Kalita Jun 05 '17 at 20:44
  • Kalita, can you test some simpler tests before running real Spec? STREAM/RandomAccess/memlat? Can you also test not only `-c 1` but also `-c 10`, `-c 100`, `-c 1000`? – osgx Jun 06 '17 at 00:50
  • Hi @osgx, I do not know how using different periods will help us. I was only comparing the total number of memory access events(not samples). If I start using -c > 1, I will get sampled events only. Anyway I will give it a try. – Arnabjyoti Kalita Jun 06 '17 at 14:13
  • Kalita, any progress? -c N test may show different sample counts, some of N may have more events captured... – osgx Jun 08 '17 at 04:13
  • Osgx, yes I will start on it tomorrow. That was what my question was, how come is this possible ? When I use -c 1, and a relatively high frequency, memory access events are more than it should be. That seems like a bug. – Arnabjyoti Kalita Jun 08 '17 at 04:26
  • Osgx, with -c 1000, I get approximately the correct values, it does not work with either -c 1 or -c 10 or -c 100. But I want this to work with -c 1. Why does this happen ? – Arnabjyoti Kalita Jun 08 '17 at 15:41
  • Kalita, they are not more, you have 158 million in the perf stat (as pointed in the first comment and only 254k (0.254 mln) sampled with -c 1. Try to measure first more simple programs with known numbers (memlat, as STREAM may have too good prefetch and request merging). Is the mem-stores event for all memory accesses or only for LLC misses? – osgx Jun 09 '17 at 19:44
  • Yeah you are right. (It is 158 million). The memory-stores event is for all memory accesses. – Arnabjyoti Kalita Jun 09 '17 at 20:10
  • Yes @osgx, I will do that. I raised another question actually. I just want you to see and if you have any ideas, can you suggest something. – Arnabjyoti Kalita Jun 10 '17 at 03:38

0 Answers0