I'm studying some codes in Java (SOR algorithm and LU factorisation). The main goal is to study the impact of executing such algorithms in a NUMA aware architecture. I already found some tools such as numactl, and other affinity environment variables. Such as: GOMP_CPU_AFFINITY (GCC) and KMP_AFFINITY (ICC) to pin threads to cores using the same algorithms in C. However i don't know what alternatives i have for studying NUMA in Java. For Java i only use numactl with performance gains using --interleave=all flag but i don't really have control about what is happening in a JVM level.
I found another tool called numastat wich is supposed to measure "NUMA counters" in a NUMA architecture and know the allocations wich were "hit" (numa_hit) and "miss" (numa_miss) in a NUMA-Node. However i'm not sure how can i use it to measure this counters with my Java application. What kind of tests (and programming techniques) should i perform in order to study the impact of NUMA in Java applications?
Thanks for your help.