I'm working on a NUMA architecture, where each compute node has 2 sockets and 4 cores by socket, for a total of 8 cores by compute node, and 24GB of RAM by node. I have to proof that setting processor affinity can have a significant impact on performances.
Do you have any program to suggest that I could use as a benchmark to show the difference of impact between using processor affinity or not? I could also write a simple C test program, using MPI, or OpenMP, or pthreads, but what operation would be the best to do that test? It must be something that would take advantage of cache locality, but that also would trigger context switching (blocking operations) so process could potentially migrate to another core, or worse, to an other socket. It must run on a multiple of 8 cores.