Why kernel preemption is safe only when preempt_count == 0?

Question

Linux kernel 2.6 introduced a new per-thread field---preempt_count---which is incremented/decremented whenever a lock is acquired/released. This field is used to allow kernel preemption: "If need_resched is set and preempt_count is zero, then a more important task is runnable and it is safe to preempt."

According to the "Linux Kernel Development" book by Robert Love: "So when is it safe to reschedule? The kernel is capable of preempting a task running in the kernel so long as it does not hold a lock."

My question is: why isn't it safe to preempt a task running in the kernel while this task holds a lock?

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends), so we wouldn't get the two threads simultaneously inside the same critical section. Can anyone please outline a problematic scenario in case we do preempt a task that holds a lock in kernel-mode?

Thanks!

At first I thought that another stackoverflow question ([https://stackoverflow.com/questions/5215958/kernel-preemption-while-holding-spinlock]) offers a possible solution: switching tasks while holding locks may lead to deadlock if, for example, two tasks acquire locks in a different order. But I don't think it's the answer, as the kernel should enforce the same locking order anyway, to prevent deadlocks on multi-processor systems. And Robert Love also says: "Because the kernel is SMP-safe, if a lock is not held, the current code is reentrant and capable of being preempted." — Simple.guy, Nov 27 '17 at 18:30
But exactly the answer to the referred question is an answer to your question. You talk about locking order, but there is no order when **single lock** in the question. Again: disabling preemption while holding a lock is kernel's way to prevent deadlock even on single lock. — Tsyvarev, Nov 27 '17 at 19:29
@Tsyvarev: but *how* can we get a deadlock? Let's imagine that thread #1 holds a lock and gives up the CPU. Thread #2 is then scheduled, tries to grab the lock, and blocks. Eventually, thread #2 finishes its time slice, so the thread #1 is scheduled again, releases the lock, and everything is fine. Am I missing something? — Simple.guy, Nov 28 '17 at 07:50
`Eventually, thread #2 finishes its time slice` - 1. Whole timeslice of the thread #2 would be **waste of time**. Too much cost for a lock. 2. Scheduling properties of threads (like priority) may delay switching back to the thread #1 even more. And infinite delay means a deadlock. — Tsyvarev, Nov 28 '17 at 08:19
@Tsyvarev: Robert Love is talking about "safety", which IMO is more about correctness than performance, so I'm not sure a "waste of time" is the answer. — Simple.guy, Nov 28 '17 at 08:34
"Waste of time" is the less problem with allowing preemtion while a thread holds a lock. As I said above, it could be scheduling properties of threads which prevents switching to thread #1 while thread #2 is active. E.g., let priority of thread #2 be more, than priority of thread #1. So, while the thread #2 *actively* waits on the lock, CPU cannot switch to the thread #1 for release the lock. As a result, the thread #2 is unable to make a progress. **Deadlock**. — Tsyvarev, Nov 28 '17 at 08:48
@Tsyvarev: I think I understand. If thread #2 is a **real-time** process of a higher priority than thread #1, the CPU will never switch to thread #1. I implicitly assumed that thread #2 is a conventional process, in which case it will eventually be preempted. — Simple.guy, Nov 28 '17 at 09:23

score 3 · Accepted Answer · answered Sep 12 '20 at 06:02

While this is an old question, the accepted answer isn't correct.

First of all the title is asking:

Why kernel preemption is safe only when preempt_count > 0?

This isn't correct, it's the opposite. Kernel preemption is disabled when preempt_count > 0, and enabled when preempt_count == 0.

Furthermore, the claim:

If another task is scheduled and tries to grab the lock, it will block (or spin until its time slice ends),

Is not always true.

Say you acquire a spin lock. Preemption is enabled. A process switch happens, and in the context of the new process some softirq runs. Preemption is disabled while running softirqs. If one of those softirqs attempts to accquire your lock it will never stop spinning because preemption is disabled. Thus you have a deadlock.

You have no control over whether the process that preempts yours will run softirqs or not. The preempt_count field where you disable softirqs is process-specific. Softirqs have to run with preemption disabled to preserve the per-cpu serialization of softirqs.

Thanks for your answer! It seems more correct than my answer so I'll accept it. (I also fixed the question, thanks for catching my mistake.) — Simple.guy, Sep 13 '20 at 08:19
Some references: UTLK3 pages 173--174 confirm the softirqs disable kernel preemption. — Simple.guy, Sep 13 '20 at 08:20

score 1 · Answer 2 · answered Nov 28 '17 at 11:51

With the help of @Tsyvarev, I think I can now answer my own question and depict a problematic scenario in which we do preempt a task that holds a lock in kernel-mode.

Thread #1 holds a spin-lock and gets preempted.
Thread #2 is then scheduled, and spins to grab the spin-lock.

Now, if thread #2 is a conventional process, it will eventually finish its time slice. In that case, thread #1 will be scheduled again, release the lock, and we are all good. But, if thread #2 is real-time process of a higher priority, thread #1 will never get to run again and we have a deadlock.

This answer is corroborated by another stackoverflow thread which cites the FreeBSD documentation:

While locks can protect most data in the case of a preemption, not all of the kernel is preemption safe. For example, if a thread holding a spin mutex preempted and the new thread attempts to grab the same spin mutex, the new thread may spin forever as the interrupted thread may never get a chance to execute.

although the above quote doesn't explicitly explain why the "interrupted thread may never get a chance to execute" again.

Why kernel preemption is safe only when preempt_count == 0?

2 Answers2