efficient use of NUMA architecture

Question

I'm writing a multithreaded java program that uses intensive CPU and memory usage. The goal of the program is to execute some algorithm on a graph. The program is executed on a NUMA machine running linux and I'd like to get the best possible performance of that.

For this I make a number of copies of the graph per each NUMA-node, so that each thread would be able to access the graph on a local memory.

The part of local memory allocations is already done by setting affinity before allocating each new copy of the graph. This is done with jna, so I prefer to stay with this library and not to add jni code, if it is possible.

My question is how can I check on which core a worker thread is running in order to make the reads from the local memory?

I understand that the thread-to-core binding could change during the execution. However, the kernel tries to run the thread on the same NUMA-node at all time-slices. Therefore checking, only in the beginning, on which core the thread is running would work for most of the cases.

This is quite difficult to do even in C/C++ due to the poor quality/support of NUMA libraries. — Mysticial, May 21 '14 at 23:45
I don't need specifically NUMA libraries. Just to know on which core the thread is running. From core-id I can know the NUMA-node without problems. — jutky, May 22 '14 at 04:42
I highly doubt you can do that with Java. The only real option is C++ (and I am a Java developer myself). But for NUMA architectures and parallel processing on them C++ and MPI are the only choices. — Alexandros, May 22 '14 at 07:04
Alexandros, you might be surprised how many more options are out there. C, Python and Go are just a couple of other languages that will work fine to take advantage of NUMA systems. Each one of them has its own options for synchronization and IPC ranging from mmap to pipes to sockets to SHMEM and other libraries to language-specific features. If you're concerned that some of these options don't contain standardized features to figure out your NUMA topology though, that's probably true, although `man 7 numa` is some help on Linux. — Aaron Altman, Jun 12 '14 at 16:14

score 3 · Accepted Answer · answered May 26 '14 at 11:08

It turns out that there is a method that is callable through jna to get the desired info. The method name is: sched_getcpu. And the full code snippet looks like this

public interface CLibrary extends Library{
    public static final CLibrary INSTANCE = 
           (CLibrary) Native.loadLibrary("c", CLibrary.class);
    public int sched_getcpu() throws LastErrorException;
}

Now, when you make

CLibrary.INSTANCE.sched_getcpu();

You get the core id, where the current thread is running.

score 1 · Answer 2 · answered May 24 '14 at 20:56

I found that the easiest way to accomplish the above task is to run a shell command and parse the output.

Given that I know the process id and the thread id of the current thread (achievable with jna), I run the following command:

ps -p <pid> -L -o tid,psr | grep <tid>

The result is a line with two numbers, the first one is the thread id and the second one is the core id where this thread is executing.

In a loop, I was setting the thread affinity to different cores, and checked the output of the above command. The output was always correct.

score 0 · Answer 3 · edited May 23 '17 at 12:08

0

What you are talking about is thread affinity. Maybe this will help.

Java thread affinity

The other thing you need to do if that doesn't cover it is use native code to figure out which core is on which numa.

edited May 23 '17 at 12:08

Community

1
1

answered May 22 '14 at 01:23

johnnycrash

5,184
5
34
58

I already use thread affinity, to make local copies of the graph on each NUMA-node (as stated in the question). At the startup stage it is ok for me to run with thread affinity. But I don't want to bound the whole parallel algorithm to specific cores. I rely on the operating system that would make better load balancing than I. About the "native code" part of you answer - that what the question was about. – jutky May 22 '14 at 04:35

efficient use of NUMA architecture

3 Answers3