I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
(Haswell
) processor. AFAIK, mem_load_uops_retired.l3_miss
, counts the number of DRAM demand
(i.e., non-prefetch
) data read accesses. offcore_response.demand_data_rd.l3_miss.local_dram
, as its name suggests, counts the number of demand
data reads targeted to DRAM. Therefore, these two events seem to be equivalent (or at least almost the same). But based on the following benchmarks the former event is much less frequent than the latter:
1) Initializing a 1000-Elment Global Array in a Loop in C
:
Performance counter stats for '/home/ahmad/Simple Progs/loop':
1,363 mem_load_uops_retired.l3_miss
1,543 offcore_response.demand_data_rd.l3_miss.local_dram
0.000749574 seconds time elapsed
0.000778000 seconds user
0.000000000 seconds sys
2) Opening a PDF Document in Evince:
Performance counter stats for '/opt/evince-3.28.4/bin/evince':
936,152 mem_load_uops_retired.l3_miss
1,853,998 offcore_response.demand_data_rd.l3_miss.local_dram
4.346408203 seconds time elapsed
1.644826000 seconds user
0.103411000 seconds sys
3) Running Wireshark for 5 seconds:
Performance counter stats for 'wireshark':
5,161,671 mem_load_uops_retired.l3_miss
8,126,526 offcore_response.demand_data_rd.l3_miss.local_dram
15.713828395 seconds time elapsed
0.904280000 seconds user
0.693906000 seconds sys
4) Running Blur Filter on an Image in Inkscape:
Performance counter stats for 'inkscape':
13,852,121 mem_load_uops_retired.l3_miss
23,475,970 offcore_response.demand_data_rd.l3_miss.local_dram
25.355643897 seconds time elapsed
7.244404000 seconds user
1.019895000 seconds sys
In all four benchmarks, offcore_response.demand_data_rd.l3_miss.local_dram
is nearly twice as frequent as mem_load_uops_retired.l3_miss
. Is this reasonable? Why? Please, tell me if the benchmarks are too complicated and coarse-grained!