3

I am trying to establish the bottleneck in my code using perf and ocperf. If I do a 'detailed stat' run on my binary, two statistics are reported in red text, which I suppose mean that it is too high.

L1-dcache-load-misses is in red at 28.60%

iTLB-load-misses is in red at 425.89%

# ~bram/src/pmu-tools/ocperf.py stat -d -d -d -d -d ./bench ray
perf stat -d -d -d -d -d ./bench ray
Loaded 455 primitives.
Testing ray against 455 primitives.

 Performance counter stats for './bench ray':

       9031.444612      task-clock (msec)         #    1.000 CPUs utilized          
                15      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               292      page-faults               #    0.032 K/sec                  
    28,786,063,163      cycles                    #    3.187 GHz                      (61.47%)
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    55,742,952,563      instructions              #    1.94  insns per cycle          (69.18%)
     3,717,242,560      branches                  #  411.589 M/sec                    (69.18%)
        18,097,580      branch-misses             #    0.49% of all branches          (69.18%)
    10,230,376,136      L1-dcache-loads           # 1132.751 M/sec                    (69.17%)
     2,926,349,754      L1-dcache-load-misses     #   28.60% of all L1-dcache hits    (69.21%)
       145,843,523      LLC-loads                 #   16.148 M/sec                    (69.32%)
            49,512      LLC-load-misses           #    0.07% of all LL-cache hits     (69.33%)
   <not supported>      L1-icache-loads          
           260,144      L1-icache-load-misses     #    0.029 M/sec                    (69.34%)
    10,230,376,830      dTLB-loads                # 1132.751 M/sec                    (69.34%)
             1,197      dTLB-load-misses          #    0.00% of all dTLB cache hits   (61.59%)
             2,294      iTLB-loads                #    0.254 K/sec                    (61.55%)
             9,770      iTLB-load-misses          #  425.89% of all iTLB cache hits   (61.51%)
   <not supported>      L1-dcache-prefetches     
   <not supported>      L1-dcache-prefetch-misses

       9.032234014 seconds time elapsed

My questions:

  1. What would be a reasonable figure for L1 data cache misses?
  2. What would be a reasonable figure for iTLB-load-misses?
  3. Why can iTLB-load-misses exceed 100%? In other words: why is iTLB-load-misses exceeding iTLB-loads? I've even seen it spike as high as 568%

Also, my machine has a Haswell CPU. I would have expected the stalled-cycles stat to be included?

Bram
  • 7,440
  • 3
  • 52
  • 94
  • 1
    Possible duplicate of [how to interpret perf iTLB-loads,iTLB-load-misses](https://stackoverflow.com/questions/49933319/how-to-interpret-perf-itlb-loads-itlb-load-misses) – Peter Cordes May 29 '18 at 13:02

0 Answers0