I want to retrieve the number of DRAM accesses in my application. Precisely, I need to distinguish between data and code accesses. The processor is an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz
(Haswell
). Based on Intel Software Developer's Manual, Volume 3 and Perf
, I could find and categorize the following memory-access-related events:
(A)
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
=========================================================================
(B)
mem_load_uops_l3_miss_retired.local_dram
mem_load_uops_retired.l3_miss
=========================================================================
(C)
offcore_response.all_code_rd.l3_miss.any_response
offcore_response.all_code_rd.l3_miss.local_dram
offcore_response.all_data_rd.l3_miss.any_response
offcore_response.all_data_rd.l3_miss.local_dram
offcore_response.all_reads.l3_miss.any_response
offcore_response.all_reads.l3_miss.local_dram
offcore_response.all_requests.l3_miss.any_response
=========================================================================
(D)
offcore_response.all_rfo.l3_miss.any_response
offcore_response.all_rfo.l3_miss.local_dram
=========================================================================
(E)
offcore_response.demand_code_rd.l3_miss.any_response
offcore_response.demand_code_rd.l3_miss.local_dram
offcore_response.demand_data_rd.l3_miss.any_response
offcore_response.demand_data_rd.l3_miss.local_dram
offcore_response.demand_rfo.l3_miss.any_response
offcore_response.demand_rfo.l3_miss.local_dram
=========================================================================
(F)
offcore_response.pf_l2_code_rd.l3_miss.any_response
offcore_response.pf_l2_data_rd.l3_miss.any_response
offcore_response.pf_l2_rfo.l3_miss.any_response
offcore_response.pf_l3_code_rd.l3_miss.any_response
offcore_response.pf_l3_data_rd.l3_miss.any_response
offcore_response.pf_l3_rfo.l3_miss.any_response
My choices are as follows:
- It seems that the sum of
LLC-load-misses
andLLC-store-misses
will return the whole DRAM accesses (equivalently, I could useLLC-misses
inPerf
). - For data-only accesses, I used
mem_load_uops_retired.l3_miss
. It does not include stores, but seems to be OK (because stores seem to be much less frequent?!). - Simplistically,
LLC-load-misses
-mem_load_uops_retired.l3_miss
=DRAM Accesses for Code
(As code is read-only).
Are these choices reasonable?
My other questions: (The 2nd one is the most important)
- What are
local_dram
andany_response
? - At first, it seems that, group (C), is a higher resolution version of the load events of group (A). But my tests show that the events in the former group is much more frequent than the latter. For example, in a simple benchmark, the number of
offcore_response.all_reads.l3_miss.any_response
events were twice as many asLLC-load-misses
. - Group (E), pertains to
demand reads
(i.e., allnon-prefetched
reads). Does this mean that, e.g.:offcore_response.all_data_rd.l3_miss.any_response
-offcore_response.demand_data_rd.l3_miss.any_response
= DRAM read accesses caused by prefeching?
Group (D), includes DRAM access events caused by Read for Ownership
operations (for Cache Coherency
Protocols). It seems irrelevant to my problem.
Group (F), counts DRAM reads caused by L2-cache
prefetcher which is also irrelevant to my problem.