Linux SCHED_FIFO not respecting thread priorities

Question

Scenario

I have created three threads, pinned to a single core, with the following priorities under SCHED_FIFO:

main: sched_priority = 99
thread_1: sched_priority = 97
thread_2: sched_priority = 98

The work threads (thread_1,thread_2) compute the sum of 50,000,000 primes (~ 10s). They do not block or perform system calls until the end (to print output).

The main thread sleeps for one second, and then checks the promises of the work threads to see if done.

Expected Behavior

The main thread is at the highest priority. According to sched:

A SCHED_FIFO thread runs until either it is blocked by an I/O request, it is preempted by a higher priority thread, or it calls sched_yield(2).

Main should therefore print (checking ...), in second intervals. It is highest priority so should preempt anything running. When it sleeps, it is blocking, so the other threads should run.

thread_1: Finishes first, as it has priority when main is not busy.
thread_2: Finishes last, and only starts after thread_1 is completely done.

Actual Behavior

The threads finish in the opposite order expected:

Thread 1 summed 3001134 primes at priority level: 97
Thread 2 summed 3001134 primes at priority level: 98
Main: Checking ...
Main: Task 1 has finished!
Main: Task 2 has finished!
Main: Exiting at priority level: 99

Reversing the priority orders so that main has the lowest yields the exact same result.

Reproduce

Compile program with g++ -o <exec_name> <file_name>.cpp -pthread
Run with: sudo taskset --cpu-list 1 ./<exec_name>

My kernel is 5.4.0-42-generic, and my distribution (if it matters): Ubuntu 18.04.5 LTS. I do not have the preempt-rt patch installed.

Example Code

#include <thread>
#include <mutex>
#include <iostream>
#include <chrono>
#include <cstring>
#include <future>
#include <pthread.h>
#include <math.h>

// IO Access mutex
std::mutex g_mutex_io;

// Computation function (busy work)
static bool isPrime (unsigned int value)
{
    unsigned int i, root;
    if (value == 1)       return false;
    if (value == 2)       return true;
    if ((value % 2) == 0) return false;
    root = (int)(1.0 + sqrt(value));
    for (i = 3; (i < root) && (value % i != 0); i += 2);
    return (i < root ? false : true);
}

// Thread function
void foo (unsigned int id, unsigned int count)
{
    sched_param sch;
    int policy, sum = 0;

    // Get information about thread
    pthread_getschedparam(pthread_self(), &policy, &sch);

    // Compute primes
    for (unsigned int i = 1; i < count; ++i) {
        sum += (isPrime(i) ? 1 : 0);
    }

    // Print
    {
        std::lock_guard<std::mutex> lock(g_mutex_io);
        std::cout << "Thread " << id << " summed " << sum << " primes"
                  << " at priority level: " << sch.sched_priority << std::endl; 
    }

}

int main ()
{
    sched_param sch;
    int policy;

    // Declare and init task objects
    std::packaged_task<void(unsigned int, unsigned int)> task_1(foo);
    std::packaged_task<void(unsigned int, unsigned int)> task_2(foo);

    // Get the futures
    auto task_fut_1 = task_1.get_future();
    auto task_fut_2 = task_2.get_future();

    // Declare and init thread objects
    std::thread thread_1(std::move(task_1), 1, 50000000);
    std::thread thread_2(std::move(task_2), 2, 50000000);

    // Set first thread policy
    pthread_getschedparam(thread_1.native_handle(), &policy, &sch);
    sch.sched_priority = 97;
    if (pthread_setschedparam(thread_1.native_handle(), SCHED_FIFO, &sch)) {
        std::cerr << "pthread_setschedparam: " << std::strerror(errno) 
                  << std::endl;
        return -1;
    }

    // Set second thread policy
    pthread_getschedparam(thread_2.native_handle(), &policy, &sch);
    sch.sched_priority = 98;
    if (pthread_setschedparam(thread_2.native_handle(), SCHED_FIFO, &sch)) {
        std::cerr << "pthread_setschedparam: " << std::strerror(errno) 
                  << std::endl;
        return -1;
    }

    // Set main process thread priority
    pthread_getschedparam(pthread_self(), &policy, &sch);
    sch.sched_priority = 99;
    if (pthread_setschedparam(pthread_self(), SCHED_FIFO, &sch)) {
        std::cerr << "pthread_setschedparam: " << std::strerror(errno)
                  << std::endl;
        return -1;
    }

    // Detach these threads
    thread_1.detach(); thread_2.detach();

    // Check their status with a timeout
    for (int finished = 0; finished < 2; ) {
        std::this_thread::sleep_for(std::chrono::seconds(1));
        {
            std::lock_guard<std::mutex> lock(g_mutex_io);
            std::cout << "Main: Checking ..." << std::endl;
        }
        if (task_fut_1.wait_for(std::chrono::seconds(0)) == std::future_status::ready) {
            {
                std::lock_guard<std::mutex> lock(g_mutex_io);
                std::cout << "Main: Task 1 has finished!" << std::endl;
            }
            finished++;
        }
        if (task_fut_2.wait_for(std::chrono::seconds(0)) == std::future_status::ready) {
            {
                std::lock_guard<std::mutex> lock(g_mutex_io);
                std::cout << "Main: Task 2 has finished!" << std::endl;
            }
            finished++;
        }
    }
    pthread_getschedparam(pthread_self(), &policy, &sch);
    std::cout << "Main: Exiting at priority level: " << sch.sched_priority << std::endl;
    return 0;
}

Experiments

Running this program with two cores sudo taskset --cpu-list 1,2 results in the following bizarre output:

Thread 2 computed 3001134 primes at priority level: 98
Thread 1 computed 3001134 primes at priority level: 0
Main: Checking ...
Main: Task 1 has finished!
Main: Task 2 has finished!
Main: Exiting at priority level: 99

The priority of thread_1 is zero.

If I expand this to include three cores sudo taskset --cpu-list 1,2,3, then I get the behavior I expected I want on single-core:

Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Main: Checking ...
Thread 2 computed 3001134 primes at priority level: 98
Thread 1 computed 3001134 primes at priority level: 0
Main: Checking ...
Main: Task 1 has finished!
Main: Task 2 has finished!
Main: Exiting at priority level: 99

Rearranging the order in which the priorities are configured so that the main thread is done first, does not change output in the original scenario

You should be seeing identical behavior whether you do core pinning or not. Please update the question to indicate whether that's the case. Thank you! — Kuba hasn't forgotten Monica, Aug 31 '20 at 16:31
You create the work threads with high priority _before_ increasing the priority of the main thread. From reading the section of the linked page on `SCHED_FIFO`, it seems plausible that this _always_ preempts the main thread until the worker threads complete. Do you see the same behavior if you set the main thread's priority first? — bnaecker, Aug 31 '20 at 16:33
@ReinstateMonica I do pin to core, it should be right under the title in the first sentence that I mention that. I'm using `taskset --cpu-list` to run the program (see under reproduce section) to enforce this! — Micrified, Aug 31 '20 at 16:33
@bnaecker Yes, but I do not *launch* them (with `.detach()`) until I have set the priority of the main thread. I can try rearranging the order of the calls of course. — Micrified, Aug 31 '20 at 16:34
@Micrified The other commenter is asking what happens if you _do not_ pin threads to a core using taskset. So just running the executable as usual. — bnaecker, Aug 31 '20 at 16:36
`detach()` doesn't "launch" threads. Their work function is started as soon as you construct them. `detach` just allows them to operate independently, meaning you can no longer control them using their handle. — bnaecker, Aug 31 '20 at 16:37
@bnaecker Hello, I have edited my question to include the experiment asked for by Reinstate, and attempted the suggestion you made. Setting the main thread priority first had no effect on the output. — Micrified, Aug 31 '20 at 16:41
"I do pin to core" I'm saying that it shouldn't matter that you do. In fact, you must verify this assertion, and then remove the whole "pinning to the core" thing since it's immaterial - it's but a distraction, it doesn't matter. If it does (again: in the particular example you present), then you've found yourself a kernel bug, and suddenly things are very interesting. So, before we even begin to figure out the real problem, let's rule out the unlikely kernel bug, OK? :) — Kuba hasn't forgotten Monica, Aug 31 '20 at 16:59
@ReinstateMonica I ran the program with two cores, and then with three cores. I posted the output at the bottom of the post under ### Experiments section. I expect that if I add more cores, then the work gets distributed (indeed it does, it runs much faster). I pin to core originally because I want to be sure that the main thread can preempt the other threads. It does not. That is why it is important to pin to one core. Otherwise I cannot know as they all execute concurrently across the CPU cores without any reason to clash. — Micrified, Aug 31 '20 at 17:04
@Micrified : I have the same problem but with much much smaller MCVE. Could you have a look at [Linux not respecting SCHED_FIFO priority,normal or GDB execution](https://stackoverflow.com/q/64061391/3972710) and tell me what you think about , taken into account the experience you gathered from your question. — NGI, Sep 26 '20 at 17:02

score 0 · Accepted Answer · answered Aug 31 '20 at 17:46

0

When you start the two threads

// Declare and init thread objects
std::thread thread_1(std::move(task_1), 1, 50000000);
std::thread thread_2(std::move(task_2), 2, 50000000);

they may (!) immediately run and fetch the schedule parameters

// Get information about thread
pthread_getschedparam(pthread_self(), &policy, &sch);

even before you set them with pthread_setschedparam() to another value. The output might even show 0 and 0, if both threads are scheduled accordingly.

The child threads may (!) both be scheduled after the main thread has set the priority. Then you would get the expected output. But any result is possible.

When you move pthread_getschedparam() to the end of the thread just before the output, you are more likely to get the expected output of 97 and 98. But even then both threads may run until the end, even before the main thread is scheduled to set the priority.

answered Aug 31 '20 at 17:46

Olaf Dietsche

72,253
8
102
198

So you are saying that a thread with a "lower" priority (say, 97) runs while my others do not, simply because the code to assign the priorities to the second thread and main thread did not run since that thread preempted them immediately? Have I no real ability to assign priorities prior to creating the threads then? Must they assign their own priorities within their functions? – Micrified Aug 31 '20 at 18:47
*Until* the priority is set, all threads are equal. Depending on the schedule, any order may happen. I moved the `getschedparam()` to the end and saw the expected 97, 98, 99 priorities. – Olaf Dietsche Aug 31 '20 at 19:30
Yeah, I did a test where I moved the code setting the priority of `main` to the top before I create the threads, and lo-behold it worked as I wanted! So the lessons I have learned from yourself and others here is that (1) initializing thread objects actually creates the thread already (2) assigning the priority needs to be done carefully. – Micrified Aug 31 '20 at 19:50