I have two applications (both with a single thread). One is a word reindexing application (named wrmem
) which could be found here, and the other one is a simple loop that iterates over an array.
I noticed that when I pin both apps to a logical core, while its sibling hyper-thread is idle, the wrmem
runs faster. However, when I run them on two sibling hyper-threads, the wrmem
app takes longer to execute.
My question is, why it runs faster when both applications are assigned to a single logical core? It would be great if someone could kindly explain if there is any way to determine whether two applications should be pined to a single hyper-thread or on sibling hyper-threads? (probably from performance counters analysis?!)
I have attached perf's output of the events I thought might be useful to analyse.
wrmem
located on sibling hyper-thread
4036814116 dTLB-loads (23.52%)
74228013 dTLB-load-misses # 1.84% of all dTLB cache hits (23.56%)
31356167246 cycles (23.61%)
2356371896 dtlb_load_misses.walk_active (23.65%)
18110002 cache-misses # 6.865 % of all cache refs (23.62%)
263804080 cache-references (23.57%)
70217803 branch-misses # 2.62% of all branches (23.53%)
2679691364 branches (23.49%)
211970964 bus-cycles (23.49%)
13 context-switches
175707 page-faults
3996104598 L1-dcache-loads (23.49%)
527053354 L1-dcache-load-misses # 13.19% of all L1-dcache hits (23.50%)
2437399504 dTLB-stores (23.50%)
34857064 dTLB-store-misses (23.50%)
0 mem-loads (23.50%)
77522441 dtlb_load_misses.miss_causes_a_walk (23.49%)
37317020 dtlb_store_misses.miss_causes_a_walk (23.49%)
3481625263 dtlb_store_misses.walk_active (23.49%)
8.534166029 seconds time elapsed
wrmem
located on a single logical core with the other app:
4021938226 dTLB-loads (23.45%)
1339043 dTLB-load-misses # 0.03% of all dTLB cache hits (23.58%)
14092062606 cycles (23.69%)
87412240 dtlb_load_misses.walk_active (23.79%)
15980810 cache-misses # 32.547 % of all cache refs (23.84%)
49100039 cache-references (23.82%)
77863788 branch-misses # 2.86% of all branches (23.76%)
2725709999 branches (23.66%)
95364600 bus-cycles (23.55%)
246 context-switches
175706 page-faults
3989720332 L1-dcache-loads (23.44%)
453219493 L1-dcache-load-misses # 11.36% of all L1-dcache hits (23.30%)
2459754128 dTLB-stores (23.30%)
28088729 dTLB-store-misses (23.30%)
0 mem-loads (23.30%)
1996539 dtlb_load_misses.miss_causes_a_walk (23.40%)
37192560 dtlb_store_misses.miss_causes_a_walk (23.40%)
1694922205 dtlb_store_misses.walk_active (23.40%)
7.684306529 seconds time elapsed
The other app is only a loop over an array with 4096
bytes step:
Other app when running on the same logical core:
1509514481 dTLB-loads (23.47%)
1345520064 dTLB-load-misses # 89.14% of all dTLB cache hits (23.50%)
52986473567 cycles (23.52%)
51187627462 dtlb_load_misses.walk_active (23.55%)
24803686 cache-misses # 0.771 % of all cache refs (23.56%)
3218128188 cache-references (23.56%)
624235 branch-misses # 0.04% of all branches (23.57%)
1483035278 branches (23.57%)
358401846 bus-cycles (23.58%)
251 context-switches
262239 page-faults
1630995048 L1-dcache-loads (23.58%)
2707508863 L1-dcache-load-misses # 166.00% of all L1-dcache hits (23.57%)
194944978 dTLB-stores (23.57%)
1773489 dTLB-store-misses (23.53%)
0 mem-loads (23.50%)
1344962228 dtlb_load_misses.miss_causes_a_walk (23.47%)
2721695 dtlb_store_misses.miss_causes_a_walk (23.45%)
78331814 dtlb_store_misses.walk_active (23.46%)
18.162205841 seconds time elapsed
When running on sibling hyper-thread:
1570720041 dTLB-loads (23.50%)
1342959305 dTLB-load-misses # 85.50% of all dTLB cache hits (23.50%)
59079247016 cycles (23.50%)
56895513621 dtlb_load_misses.walk_active (23.53%)
37209980 cache-misses # 1.115 % of all cache refs (23.54%)
3336817534 cache-references (23.54%)
626337 branch-misses # 0.04% of all branches (23.54%)
1457502744 branches (23.54%)
399413773 bus-cycles (23.54%)
10 context-switches
262239 page-faults
1523989098 L1-dcache-loads (23.54%)
2714388590 L1-dcache-load-misses # 178.11% of all L1-dcache hits (23.54%)
150322599 dTLB-stores (23.54%)
1832015 dTLB-store-misses (23.54%)
0 mem-loads (23.54%)
1341108173 dtlb_load_misses.miss_causes_a_walk (23.54%)
2718493 dtlb_store_misses.miss_causes_a_walk (23.53%)
78263090 dtlb_store_misses.walk_active (23.51%)
16.042126229 seconds time elapsed