6

I want to bind the threads in my code to each physical core. With GCC I have successfully done this using sched_setaffinity so I no longer have to set export OMP_PROC_BIND=true. I want to do the same thing in Windows with MSVC. Windows and Linux using a different thread topology. Linux scatters the threads while windows uses a compact form. In other words in Linux with four cores and eight hyper-threads I only need to bind the threads to the first four processing units. In windows I set them to to every other processing unit.

I have successfully done this using SetProcessAffinityMask. I can see from Windows Task Manger when I right click on the processes and click "Set Affinity" that every other CPU is set (0, 2, 4, 6 on my eight hyper thread system). The problem is that the efficiency of my code is unstable when I run. Sometimes it's nearly constant but most of the time it has big changes. I changed the priority to high but it makes no difference. In Linux the efficiency is stable. Maybe Windows is still migrating the threads? Is there something else I need to do to bind the threads in Windows?

Here is the code I'm using

#ifdef _WIN32   
HANDLE process;
DWORD_PTR processAffinityMask = 0;
//Windows uses a compact thread topology.  Set mask to every other thread
for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);        
//processAffinityMask = 0x55;
process = GetCurrentProcess();
SetProcessAffinityMask(process, processAffinityMask);
#else
cpu_set_t  mask;
CPU_ZERO(&mask);
for(int i=0; i<ncores; i++) CPU_SET(i, &mask);      
sched_setaffinity(0, sizeof(mask), &mask);       
#endif

Edit: here is the code I used now which seems to be stable on Linux and Windows

    #ifdef _WIN32   
    HANDLE process;
    DWORD_PTR processAffinityMask;
    //Windows uses a compact thread topology.  Set mask to every other thread
    for(int i=0; i<ncores; i++) processAffinityMask |= 1<<(2*i);
    process = GetCurrentProcess();
    SetProcessAffinityMask(process, processAffinityMask);
    #pragma omp parallel 
    {
        HANDLE thread = GetCurrentThread();
        DWORD_PTR threadAffinityMask = 1<<(2*omp_get_thread_num());
        SetThreadAffinityMask(thread, threadAffinityMask);
    }
    #else
    cpu_set_t  mask;
    CPU_ZERO(&mask);
    for(int i=0; i<ncores; i++) CPU_SET(i, &mask);
    sched_setaffinity(0, sizeof(mask), &mask);
    #pragma omp parallel 
    {
       cpu_set_t  mask;
       CPU_ZERO(&mask);
       CPU_SET(omp_get_thread_num(),&mask);
       pthread_setaffinity_np(pthread_self(), sizeof(mask), &mask); 
    }
    #endif
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • On both platforms your code sets the process affinity mask and not the affinity mask for each individual thread, therefore the scheduler is still free to move the threads among the CPUs allowed by the process affinity mask. – Hristo Iliev Jul 21 '14 at 10:17
  • @HristoIliev, I understand what you mean. Do you know how to get the thread handle for each thread created by OpenMP? – Z boson Jul 21 '14 at 11:00
  • 1
    Inside the parallel region use [GetCurrentThread()](http://msdn.microsoft.com/en-us/library/windows/desktop/ms683182%28v=vs.85%29.aspx) to obtain the handle of the current thread and assign it an affinity mask with a single bit set based on the result from `omp_get_thread_num()`. – Hristo Iliev Jul 21 '14 at 11:36
  • @HristoIliev, you mean inside a parallel section? I tried that and it returns the same handle for each thread. I'll keep trying... – Z boson Jul 21 '14 at 11:48
  • I mean to call it from inside a parallel region so that all OpenMP threads make the call. It should not return the same thread handle in that case unless you are assigning to a shared variable. – Hristo Iliev Jul 21 '14 at 11:50
  • @HristoIliev, I added some code to my question showing what I'm trying to do. The handle is private for each thread. It's always -2. – Z boson Jul 21 '14 at 12:00
  • I should read more carefully the pages that I link to :) `GetCurrentThread` returns a constant pseudo-handle which appears to be `(HANDLE)(-2)`. `SetThreadAffinityMask(GetCurrentThread(), ...)` should work as expected. – Hristo Iliev Jul 21 '14 at 13:17
  • @HristoIliev, okay, I think I got it now. It's stable currently but I need to test it a few more times. With Linux I guess I use `pthread_setaffinity_np`? That seems to work so far (and `pthread_self()` does return a different value for each thread in the parallel region). – Z boson Jul 21 '14 at 13:25
  • I am trying to do similar thing, but having difficulty compiling it in Linux. What should I include in the header? I have #define _GNU_SOURCE #include – Shervin Nov 02 '16 at 15:57

1 Answers1

1

You should use the SetThreadAffinityMask function (see MSDN reference). You are setting the process's mask.

You can obtain a thread ID in OpenMP with this code:

int tid = omp_get_thread_num();

However the code above provides OpenMP's internal thread ID, and not the system thread ID. This article explains more on the subject:

http://msdn.microsoft.com/en-us/magazine/cc163717.aspx

if you need to explicitly work with those trheads - use the explicit affinity type as explained in this Intel documentation:

https://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/optaps/common/optaps_openmp_thread_affinity.htm

YePhIcK
  • 5,816
  • 2
  • 27
  • 52
  • Do you have some example code? I mean one which sets the mask for each thread? – Z boson Jul 21 '14 at 10:49
  • It is not much different from what you have -just use `thread IDs` instead of `process ID` when calling the function. You get the `thread ID` when you create a thread – YePhIcK Jul 21 '14 at 10:51
  • I understand that, but how do I loop though the threads? I mean how do I get the handler for each thread? I only know `GetCurrentThread()` – Z boson Jul 21 '14 at 10:52
  • The very last parameter to Windows's `CreateThread()` is a long pointer to a `thread ID` - that is the way to get the needed info for you. MSDN reference: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682453(v=vs.85).aspx – YePhIcK Jul 21 '14 at 10:54
  • A cursory look on Google provides this link to OpenMP doc (https://computing.llnl.gov/tutorials/openMP/) which lists this, under *Run-time Library Routines:* section: "Querying a thread's unique identifier (thread ID), a thread's ancestor's identifier, the thread team size" – YePhIcK Jul 21 '14 at 11:00
  • From the link above - this is how you get the `thread ID`: `tid = omp_get_thread_num();` – YePhIcK Jul 21 '14 at 11:02
  • I'm not using the Intel compiler otherwise I would use KMP_AFFINITY. So I still don't know how to get the Windows thread ID. – Z boson Jul 21 '14 at 11:10
  • 1
    There is probably way to get the threads associated with a process and loop over them. – Z boson Jul 21 '14 at 11:15
  • 1
    http://msdn.microsoft.com/en-us/library/windows/desktop/ms686852%28v=vs.85%29.aspx – Z boson Jul 21 '14 at 11:22
  • 1
    Microsoft's OpenMP implementation uses a system-managed thread pool. Enumerating and binding all process threads would affect some that do not belong to the thread pool (e.g. the manager thread, possibly hidden window message loop threads, etc.). I would instead perform the binding inside the parallel region. – Hristo Iliev Jul 21 '14 at 13:21
  • 2
    BTW, there are lots of ways of creating bad affinity masks that cause problems. IF you are going to manually be setting thread affinity, you should take a look at the [Core Detection](http://code.msdn.microsoft.com/Core-Detection-Sample-7afd28f0) sample. – Chuck Walbourn Jul 21 '14 at 19:04
  • @ChuckWalbourn, thanks for the link! I agree with you. I'm hard coding the topology in right now. The topology is different for AMD and Intel and for Linux and Windows so there are at least four permutations to handle. Additionally, the the topology could change with different version of the OS ard maybe even hardware version. So I need batter way to do this at some point. But at least I know how to bind the threads now in my code :-) – Z boson Jul 22 '14 at 06:57
  • There are some intensely bad assumptions people make when creating these affinity masks, so definitely look at the sample. – Chuck Walbourn Jul 22 '14 at 07:13
  • Found a mirror of the core detection sample [here](https://github.com/walbourn/directx-sdk-samples/tree/main/CoreDetection). – Andreas Haferburg Jul 12 '22 at 15:32
  • Oh. Guess that's your repo, @ChuckWalbourn :) – Andreas Haferburg Jul 12 '22 at 15:32
  • Yes, sorry the MSDN Code Gallery is offline these days... I posted it to GitHub. – Chuck Walbourn Jul 12 '22 at 18:29