84

In the Linux real-time process priority range 1 to 99, it's unclear to me which is the highest priority, 1 or 99.

Section 7.2.2 of "Understanding the Linux Kernel" (O'Reilly) says 1 is the highest priority, which makes sense considering that normal processes have static priorities from 100 to 139, with 100 being the highest priority:

"Every real-time process is associated with a real-time priority, which is a value ranging from 1 (highest priority) to 99 (lowest priority). "

On the other hand, the sched_setscheduler man page (RHEL 6.1) claims that 99 is the highest:

"Processes scheduled under one of the real-time policies (SCHED_FIFO, SCHED_RR) have a sched_priority value in the range 1 (low) to 99 (high)."

Which is the highest real-time priority?

David Steinhauer
  • 2,076
  • 2
  • 16
  • 14

7 Answers7

100

I did an experiment to nail this down, as follows:

  • process1: RT priority = 40, CPU affinity = CPU 0. This process "spins" for 10 seconds so it won't let any lower-priority process run on CPU 0.

  • process2: RT priority = 39, CPU affinity = CPU 0. This process prints a message to stdout every 0.5 second, sleeping in between. It prints out the elapsed time with each message.

I'm running a 2.6.33 kernel with the PREEMPT_RT patch.

To run the experiment, I run process2 in one window (as root) and then start process1 (as root) in another window. The result is process1 appears to preempt process2, not allowing it to run for a full 10 seconds.

In a second experiment, I change process2's RT priority to 41. In this case, process2 is not preempted by process1.

This experiment shows that a larger RT priority value in sched_setscheduler() has a higher priority. This appears to contradict what Michael Foukarakis pointed out from sched.h, but actually it does not. In sched.c in the kernel source, we have:

static void
__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
{
        BUG_ON(p->se.on_rq);

        p->policy = policy;
        p->rt_priority = prio;
        p->normal_prio = normal_prio(p);
        /* we are holding p->pi_lock already */
        p->prio = rt_mutex_getprio(p);
        if (rt_prio(p->prio))
                p->sched_class = &rt_sched_class;
        else
                p->sched_class = &fair_sched_class;
        set_load_weight(p);
}

rt_mutex_getprio(p) does the following:

return task->normal_prio;

While normal_prio() happens to do the following:

prio = MAX_RT_PRIO-1 - p->rt_priority;  /* <===== notice! */
...
return prio;

In other words, we have (my own interpretation):

p->prio = p->normal_prio = MAX_RT_PRIO - 1 - p->rt_priority

Wow! That is confusing! To summarize:

  • With p->prio, a smaller value preempts a larger value.

  • With p->rt_priority, a larger value preempts a smaller value. This is the real-time priority set using sched_setscheduler().

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
David Steinhauer
  • 2,076
  • 2
  • 16
  • 14
48

Short Answer

99 will be the winner for real time priority.

PR is the priority level (range -100 to 39). The lower the PR, the higher the priority of the process will be.

PR is calculated as follows:

  • for normal processes: PR = 20 + NI (NI is nice and ranges from -20 to 19)
  • for real time processes: PR = - 1 - real_time_priority (real_time_priority ranges from 1 to 99)

Long Answer

There are 2 types of processes, the normal ones and the real time For the normal ones (and only for those), nice is applied as follows:

Nice

The "niceness" scale goes from -20 to 19, whereas -20 it's the highest priority and 19 the lowest priority. The priority level is calculated as follows:

PR = 20 + NI

Where NI is the nice level and PR is the priority level. So as we can see, the -20 actually maps to 0, while the 19 maps to 39.

By default, a program nice value is 0 bit it is possible for a root user to lunch programs with a specified nice value by using the following command:

nice -n <nice_value> ./myProgram 

Real Time

We could go even further. The nice priority is actually used for user programs. Whereas the UNIX/LINUX overall priority has a range of 140 values, nice value enables the process to map to the last part of the range (from 100 to 139). This equation leaves the values from 0 to 99 unreachable which will correspond to a negative PR level (from -100 to -1). To be able to access to those values, the process should be stated as "real time".

There are 5 scheduling policies in a LINUX environment that can be displayed with the following command:

chrt -m 

Which will show the following list:

1. SCHED_OTHER   the standard round-robin time-sharing policy
2. SCHED_BATCH   for "batch" style execution of processes
3. SCHED_IDLE    for running very low priority background jobs.
4. SCHED_FIFO    a first-in, first-out policy
5. SCHED_RR      a round-robin policy

The scheduling processes could be divided into 2 groups, the normal scheduling policies (1 to 3) and the real time scheduling policies (4 and 5). The real time processes will always have priority over normal processes. A real time process could be called using the following command (The example is how to declare a SCHED_RR policy):

chrt --rr <priority between 1-99> ./myProgram

To obtain the PR value for a real time process the following equation is applied:

PR = -1 - rt_prior

Where rt_prior corresponds to the priority between 1 and 99. For that reason the process which will have the higher priority over other processes will be the one called with the number 99.

It is important to note that for real time processes, the nice value is not used.

To see the current "niceness" and PR value of a process the following command can be executed:

top

Which shows the following output:

enter image description here

In the figure the PR and NI values are displayed. It is good to note the process with PR value -51 that corresponds to a real time value. There are also some processes whose PR value is stated as "rt". This value actually corresponds to a PR value of -100.

J Agustin Barrachina
  • 3,501
  • 1
  • 32
  • 52
  • For people looking for quick answer on chrt on real-time scheduling. For chrt, 1 has the lowest priority and 99 highest. I'm surprised that it's not mentioned in man page of chrt. – B.Z. Oct 06 '20 at 20:31
  • 2
    I'm unable to suggest further edits, but I believe there are minor contradictory details in this answer. The answer starts off by saying "PR = 20 - NI" and that PR values range from -100 to 40. But later on, it says "PR = 20 + NI" and that PR values range from -100 to 39. I am relatively confident that the latter (ie, PR = 20 +NI and goes from -100 to 39) is correct! – Ken Lin Apr 22 '21 at 22:50
  • @KenLin yes, there was a mistake but it's actually the opposite. PR = 20 + NI. It has been corrected, thank you for pointing it out. – J Agustin Barrachina Apr 23 '21 at 10:22
  • 1
    I still see "PR is the priority level (range -100 to 40)". Shouldn't 40 be 39 instead? – Ken Lin Apr 23 '21 at 15:00
11

This comment in sched.h is pretty definitive:

/*
 * Priority of a process goes from 0..MAX_PRIO-1, valid RT
 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
 * values are inverted: lower p->prio value means higher priority.
 *
 * The MAX_USER_RT_PRIO value allows the actual maximum
 * RT priority to be separate from the value exported to
 * user-space.  This allows kernel threads to set their
 * priority to a value higher than any user task. Note:
 * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
 */

Note this part:

Priority values are inverted: lower p->prio value means higher priority.

Michael Foukarakis
  • 39,737
  • 6
  • 87
  • 123
  • 1
    But are those the same priority values that are exposed to userspace? (I don't believe so). – davmac Jan 20 '16 at 09:07
  • 1
    @davmac: No, they are completely different. These are internal to the kernel. – Michael Foukarakis Jan 20 '16 at 09:08
  • 1
    Right. It seems like the book is talking about these values, since I don't think a single equivalent priority value is accessible to userspace. In which case, the book has only one detail wrong (assuming that MAX_PRIO=139 and MAX_RT_PRIO=99) : the highest priority value is 0, not 1. (Upvoted: your answer is to the point). – davmac Jan 20 '16 at 09:35
6

To determine the highest realtime priority you can set programmatically, make use of the sched_get_priority_max function.

On Linux 2.6.32 a call to sched_get_priority_max(SCHED_FIFO) returns 99.

See http://linux.die.net/man/2/sched_get_priority_max

Pawel
  • 71
  • 3
  • 3
    More pertinent to OP's question, from that same man page: "Processes with numerically higher priority values are scheduled before processes with numerically lower priority values". – davmac Jan 20 '16 at 09:06
2

Linux Kernel implements two separate priority ranges -

  1. Nice value: -20 to +19; larger nice values correspond to lower priority.

  2. Real-time priority: 0 to 99; higher real-time priority values correspond to a greater priority.

Rohit B
  • 21
  • 2
  • Thanks for helping, but you could improve the answer by explaining the relationship between these two priorities. For example, add details how the real-time priority range and the non-real-time range overlap, and what the "nice" priority refers in this context. – Olli May 15 '20 at 08:06
1

Your assumption that normal processes have static priorities from 100 to 139 is volatile at best and invalid at worst. What I mean is that: set_scheduler only allows the sched_priority to be 0 (which indicates dynamic priority scheduler) with SCHED_OTHER / SCHED_BATCH and SCHED_IDLE (true as of 2.6.16).

Programmatically static priorities are 1-99 only for SCHED_RR and SCHED_FIFO

Now you may see priorities from 100-139 being used internally by a dynamic scheduler howeve,r what the kernel does internally to manage dynamic priorities (including flipping the meaning of high vs. low priority to make the comparison or sorting easier) should be opaque to the user-space.

Remember in SCHED_OTHER you are mostly stuffing the processes in the same priority queue.

The idea is to make kernel easier to debug and avoid goofy out-of-bound mistakes.

So the rationale in switching the meaning could be that as a kernel developer don't want to use math like 139-idx (just in case idx > 139) ... it is better to do math with idx-100 and reverse the concept of low vs. high because idx < 100 is well understood.

Also a side effect is that niceness becomes easier to deal with. 100 - 100 <=> nice == 0; 101-100 <=> nice == 1; etc. is easier. It collapses to negative numbers nicely as well (NOTHING to do with static priorities) 99 - 100 <=> nice == -1 ...

Ahmed Masud
  • 21,655
  • 3
  • 33
  • 58
  • Okay, I see that the sched_priority is different than the static priority, and that all non-real-time processes have a sched_priority of 0. – David Steinhauer Jan 17 '12 at 15:35
  • 1
    So the static priority only affects the time quantum of real-time processes. The sched_priority I believe is what the O'Reilly book refers to as "real-time priority." If so, the O'Reilly book has it backward. So, back to my original question: is a sched_priority of 99 the highest priority, or is 1 the highest priority? – David Steinhauer Jan 17 '12 at 15:57
0
  1. Absolutely, the realtime priority is applicable to the RT policies FIFO and RR which varies from 0-99.
  2. We do have the 40 as a count of the non real time process priority for BATCH, OTHER policies which varies from 0-39 not from 100 to 139. This, you can observe by looking at any process in the system which is not a realtime process. It will bear a PR of 20 and NIceness of 0 by default. If you decrease the niceness of a process (usually, lower or negative the number lesser the niceness, more hungry the process), say from 0 to -1, you 'll observe that the PRiority will drop to 19 from 20. This simply tells that, if you make a process more hungry or would like to get a little more attention by decreasing the niceness value of the PID you 'll get decrease in priority too, thus lower the PRIORITY number HIGHER the PRIORITY.

    Example:
    
    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    2079 admin     10 -10  280m  31m 4032 S  9.6  0.0  21183:05 mgmtd
    [admin@abc.com ~]# renice -n -11 2079
    2079: old priority -10, new priority -11
    [admin@abc.com ~]# top -b | grep mgmtd
    2079 admin      9 -11  280m  31m 4032 S  0.0  0.0  21183:05 mgmtd
    ^C
    

Hope this practical example clarifies the doubts and may help fix the words at incorrect source, if any.

Sturdyone
  • 41
  • 2