I am trying to use some of the uncore hardware counters, such as: skx_unc_imc0-5::UNC_M_WPQ_INSERTS
. It's supposed to count the number of allocations into the Write Pending Queue. The machine has 2 Intel Xeon Gold 5218 CPUs with cascade lake architecture, with 2 memory controllers per CPU. linux version is 5.4.0-3-amd64. I have the following simple loop and I am reading this counter for it. Array elements are 64 byte in size, equal to cache line.
for(int i=0; i < 1000000; i++){
array[i].value=2;
}
For this loop, when I map memory to DRAM NUMA node, the counter gives around 150,000 as a result, which maybe makes sense: There are 6 channels in total for 2 memory controllers in front of this NUMA node, which use DRAM DIMMs in interleaving mode. Then for each channel there is one separate WPQ I believe, so skx_unc_imc0 gets 1/6 from the entire stores. There are skx_unc_imc0-5
counters that I got with papi_native_avail
, supposedly each for different channels.
The unexpected result is when instead of mapping to DRAM NUMA node, I map the program to Non-Volatile Memory, which is presented as a separate NUMA node to the same socket. There are 6 NVM DIMMs per-socket, that create one Interleaved Region. So when writing to NVM, there should be similarly 6 different channels used and in front of each, there is same one WPQ, that should get again 1/6 write inserts.
But UNC_M_WPQ_INSERTS
returns only around up 1000 as a result on NV memory. I don't understand why; I expected it to give similarly around 150,000 writes in WPQ.
Am I interpreting/understanding something wrong? Or is there two different WPQs per channel depending wether write goes to DRAM or NVM? Or what else could be the explanation?