Why kernel code/thread executing in interrupt context cannot sleep?

Question

I am reading following article by Robert Love

http://www.linuxjournal.com/article/6916

that says

"...Let's discuss the fact that work queues run in process context. This is in contrast to the other bottom-half mechanisms, which all run in interrupt context. Code running in interrupt context is unable to sleep, or block, because interrupt context does not have a backing process with which to reschedule. Therefore, because interrupt handlers are not associated with a process, there is nothing for the scheduler to put to sleep and, more importantly, nothing for the scheduler to wake up..."

I don't get it. AFAIK, scheduler in the kernel is O(1), that is implemented through the bitmap. So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

Note that the linux-rt patchset actually does make interrupt handlers threaded, so they can sleep. This improves latency, but performance takes a significant hit. — bdonlan, Jul 30 '09 at 17:53
This question worth great deal of more reviewers, IMO, each low-level programmer thought about this question during his career. see also this mail exchange http://hi.baidu.com/rwen2012/item/2040e7cace3c6a0dac092f3c — Shmil The Cat, Apr 05 '14 at 10:45

Keith Smith · Answer 1 · 2009-06-30T03:19:51.650

45

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

The problem is that the interrupt context is not a process, and therefore cannot be put to sleep.

When an interrupt occurs, the processor saves the registers onto the stack and jumps to the start of the interrupt service routine. This means that when the interrupt handler is running, it is running in the context of the process that was executing when the interrupt occurred. The interrupt is executing on that process's stack, and when the interrupt handler completes, that process will resume executing.

If you tried to sleep or block inside an interrupt handler, you would wind up not only stopping the interrupt handler, but also the process it interrupted. This could be dangerous, as the interrupt handler has no way of knowing what the interrupted process was doing, or even if it is safe for that process to be suspended.

A simple scenario where things could go wrong would be a deadlock between the interrupt handler and the process it interrupts.

Process1 enters kernel mode.
Process1 acquires LockA.
Interrupt occurs.
ISR starts executing using Process1's stack.
ISR tries to acquire LockA.
ISR calls sleep to wait for LockA to be released.

At this point, you have a deadlock. Process1 can't resume execution until the ISR is done with its stack. But the ISR is blocked waiting for Process1 to release LockA.

edited Jun 30 '09 at 03:19

answered Jun 27 '09 at 21:10

Keith Smith

3,611
3
19
12

Also, interrupts usually require very fast servicing, or you can easily get into all sorts of trouble. – nos Jun 27 '09 at 22:43
1

OK..there are two arguments in your claim: 1. "If you tried to sleep or block .... or even if it is safe for that process to be suspended." I totally not buy this argument. First of all kernel does not care what user space process is doing or if its safe to suspend it. Furthermore, with preemptive kernel, its even possible to block a kernel thread and start another. 2. "In the worst case, blocking from an interrupt handler could cause deadlocks" This is a locking issue. What if my ISR releases all locks before calling sleep? – Methos Jun 29 '09 at 20:33
@Methos - Re 1. The problem is when you interrupt a process in kernel mode, not one that is in user mode. If you interrupt a kernel thread and let the handler block, it wouldn't be the same as a normal thread preemption, because you would be simultaneously preempting two unrelated contexts, the kernel thread, and the ISR. If there are dependencies between them, you're dead. Hence my example about the kernel thread holding a resource that the ISR needs. The kernel thread won't be able to continue executing until the ISR completes. But the ISR is waiting for the kernel thread. Deadlock. – Keith Smith Jun 30 '09 at 03:11
1

@Methos - Re 2. In my example, its a kernel-mode process that holds the lock. I'll edit my answer to provide a clearer explanation. Note that the ISR can't release locks before calling sleep because the ISR can't acquire locks in the first place. If you try to acquire a lock, you might block, which is just as bad as directly calling sleep. – Keith Smith Jun 30 '09 at 03:13
Keith, I still don't agree with you. Locks have got nothing to do with this issue (although the example that you gave shows a classic deadlock situation). I have commented in arsane's answer about what I think. – Methos Jul 01 '09 at 05:52
@Keith, Could you also add a point to the above comment how the process1 gets its control back. In the sense that the ISR doesnt really 'return' to the process1 function that was being executed on the stack, in other words, the where does the return address in the activation record of the ISR point to ?! – suppie Feb 24 '14 at 01:20
I don't think the example explains why we can't have sleep in the interrupt handler. The dead lock can be prevented by using spin_lock_irqsave and it has nothing related with why we can't sleep in the interrupt handler. – pierrotlefou Aug 13 '14 at 08:39

score 38 · Accepted Answer · edited Apr 22 '13 at 07:16

38

I think it's a design idea.

Sure, you can design a system that you can sleep in interrupt, but except to make to the system hard to comprehend and complicated(many many situation you have to take into account), that's does not help anything. So from a design view, declare interrupt handler as can not sleep is very clear and easy to implement.

From Robert Love (a kernel hacker): http://permalink.gmane.org/gmane.linux.kernel.kernelnewbies/1791

You cannot sleep in an interrupt handler because interrupts do not have a backing process context, and thus there is nothing to reschedule back into. In other words, interrupt handlers are not associated with a task, so there is nothing to "put to sleep" and (more importantly) "nothing to wake up". They must run atomically.

This is not unlike other operating systems. In most operating systems, interrupts are not threaded. Bottom halves often are, however.

The reason the page fault handler can sleep is that it is invoked only by code that is running in process context. Because the kernel's own memory is not pagable, only user-space memory accesses can result in a page fault. Thus, only a few certain places (such as calls to copy_{to,from}_user()) can cause a page fault within the kernel. Those places must all be made by code that can sleep (i.e., process context, no locks, et cetera).

edited Apr 22 '13 at 07:16

Henridv

766
12
23

answered Jun 29 '09 at 05:37

Sam Liao

43,637
15
53
61

I kind of came to similar conclusion. But I am not sure how to back this claim. Just wanted to find if there is any "mathematical" reason to not do so. – Methos Jun 29 '09 at 20:34
I don't know how you would prove, in the "mathematical" sense that it's impossible to build a system that allows an ISR to sleep. But I've programmed inside a number of OSs, and none of them allowed this. In practice, the closest I've ever seen to allowing an interrupt handler to do things like sleep is to have an explicit process that does the work of handling interrupts. But the system's I've seen that do this (e.g., Solaris) still have a minimal ISR that isn't allowed to do things like sleep. All it does is to wakeup the interrupt thread and let it do the real work. – Keith Smith Jun 30 '09 at 03:25
@Keith, this problem seems no answer from authority, though I think it is possible to design a system that ISR can sleep. Here I attached Robert Love's answer for this question, but from my view, I think it's a design idea. – Sam Liao Jun 30 '09 at 03:38
@arsane - What do you mean when you say, "It's a design idea?" – Keith Smith Jun 30 '09 at 04:19
@Keith, as for my opinion, I think this is system design strategy for forbidding ISR sleep to make system design simple and clear. – Sam Liao Jun 30 '09 at 05:57
@arsane I accept your answer that its a design idea and thanks for poiniting out that mailing thread (was actually nice to see someone had exactly same query). However, there is missing piece of information that both yours and Robert's explanations have not mentioned. Let me fill it.. ISR cannot sleep as they run in interrupt context and there is no backing process context. There is nothing there to reschedule back to because only *process contexts* are schedulable in linux and *this* is a design choice – Methos Jul 01 '09 at 05:50
Just to mention to anyone who refers to this in future, a PF is an exception. – Methos Jul 01 '09 at 06:08
5

By calling it a design choice you miss why an ISR has no backing process context. It does you no good to give it one. When an O/S takes an interrupt, the state of the O/S is undefined because it can occur at nearly any point. Undefined state means while in an ISR the O/S is unusable to do things like schedule threads. Resources shared by an ISR and the O/S are protected by spin locks so the state of those resources will be consistent while in an ISR. See linuxjournal.com/article/5833 about spinlocks. If you used normal locks, ones that didn't disable interrupts, you'd deadlock instead of crash – Tony Lee Jul 02 '09 at 03:52
Its true that we need to use spinlocks and more than that to use spinlock_irq_save. Because, if ISR1 acquires a spinlock 1 and is again interrupted by ISR2 that tries to acquire spinlock 1, the processor will dead lock (I actually tried something similar. Surprisingly kernel detects soft lock up after 11s). I do not understand what do you mean by "When an O/S takes ... is unusable to do things like schedule threads". – Methos Jul 07 '09 at 19:20
There is nothing hard about interrupts that can sleep. You just model them as threads. The IRQ handler is then just a very tiny stub that takes, for instance IRQ #13 and quickly dispatches a real-time priority IRQ thread #13. Execution then basically continues on that thread, which has all the trappings to be able to sleep in a queue and wake up. It makes a lot of sense to design a RTOS this way, because you can sort out the interrupts. Interrupts which are not mapped to proper threads are a mess (i.e. other way around). – Kaz Mar 29 '12 at 04:43

Tony Lee · Answer 3 · 2018-02-26T17:12:13.980

7

Because the thread switching infrastructure is unusable at that point. When servicing an interrupt, only stuff of higher priority can execute - See the Intel Software Developer's Manual on interrupt, task and processor priority. If you did allow another thread to execute (which you imply in your question that it would be easy to do), you wouldn't be able to let it do anything - if it caused a page fault, you'd have to use services in the kernel that are unusable while the interrupt is being serviced (see below for why).

Typically, your only goal in an interrupt routine is to get the device to stop interrupting and queue something at a lower interrupt level (in unix this is typically a non-interrupt level, but for Windows, it's dispatch, apc or passive level) to do the heavy lifting where you have access to more features of the kernel/os. See - Implementing a handler.

It's a property of how O/S's have to work, not something inherent in Linux. An interrupt routine can execute at any point so the state of what you interrupted is inconsistent. If you interrupted the thread scheduling code, its state is inconsistent so you can't be sure you can "sleep" and switch threads. Even if you protect the thread switching code from being interrupted, thread switching is a very high level feature of the O/S and if you protected everything it relies on, an interrupt becomes more of a suggestion than the imperative implied by its name.

edited Feb 26 '18 at 17:12

answered Jun 27 '09 at 20:43

Tony Lee

5,622
1
28
45

What do you mean by thread switching infrastructure is shutdown? Is this just a theoretical knowledge or can you give me reference to actual code in kernel that does that to support your claim? – Methos Jun 29 '09 at 20:28
It's a theoretical view of what an interrupt is - logically if you sleep, that only benefits you if you can switch threads, but you can't. And if you did, the whole system would lock up anyway because you can't get very far w/o hitting some service that can't function. If you can interrupt the paging system at any point, why would you think it could be reentered and have it still function? It would nearly impossible to code an o/s if that was required. – Tony Lee Jun 29 '09 at 22:58
By shutdown, I mean can't function - the execution quantum can never expire and a new thread will never be scheduled as long as you're servicing the interrupt. – Tony Lee Jun 29 '09 at 23:02
2

+1 - Excellent points. You say "It's a property of how O/S's have to work." I would go further and say "It's the way the hardware works." The operating system has to live with it. – Keith Smith Jul 01 '09 at 02:28
1

"Even if you protect the thread...implied by its name." Not agreeing. You are saying that whenever there is interrupt, an OS has to leave everything else and attend it first and its not a good idea to mask interrupts because then interrupt is not an interrupt but a suggestion. AFAIK an interrupt is always a suggestion/indication that someone needs attention. Its upto the OS, when to handle it. The reason its called interrupt is because this can happen at any time that is in between processor clock cycle. – Methos Jul 01 '09 at 06:01
An alternative to interrupts would be polling which will happen only when processor wants to poll that is they will happen on clear cut clock cycle boundaries. (also polling requires additional and explicit cycles from processor) – Methos Jul 01 '09 at 06:02
1

I'm not following your comments completely. It's an interrupt because the processor will stop what it's doing - even roll back the current instruction, and switch execution context. It's the switching part that makes it an interrupt. Yes you can mask, but that makes for a poor O/S that has to do real time processing (e.g., skipping audio). The article I reference for handling interrupts makes it clear using CLI is deprecated (and doesn't work so well with multiple processors since the other processor is still running) – Tony Lee Jul 01 '09 at 15:34
1

What you said makes sense. Sorry that my earlier comment is confusing. – Methos Jul 01 '09 at 22:44
APIC documentation link is dead. – sevo Feb 25 '18 at 19:50

score 4 · Answer 4 · answered Jan 16 '11 at 23:26

Disallowing an interrupt handler to block is a design choice. When some data is on the device, the interrupt handler intercepts the current process, prepares the transfer of the data and enables the interrupt; before the handler enables the current interrupt, the device has to hang. We want keep our I/O busy and our system responsive, then we had better not block the interrupt handler.

I don't think the "unstable states" are an essential reason. Processes, no matter they are in user-mode or kernel-mode, should be aware that they may be interrupted by interrupts. If some kernel-mode data structure will be accessed by both interrupt handler and the current process, and race condition exists, then the current process should disable local interrupts, and moreover for multi-processor architectures, spinlocks should be used to during the critical sections.

I also don't think if the interrupt handler were blocked, it cannot be waken up. When we say "block", basically it means that the blocked process is waiting for some event/resource, so it links itself into some wait-queue for that event/resource. Whenever the resource is released, the releasing process is responsible for waking up the waiting process(es).

However, the really annoying thing is that the blocked process can do nothing during the blocking time; it did nothing wrong for this punishment, which is unfair. And nobody could surely predict the blocking time, so the innocent process has to wait for unclear reason and for unlimited time.

This should be the accepted answer. There is nothing that can fundamentally prevent blocking inside an interrupt handler. However, blocking, as noted by OP, is unfair to the process that was running when the interrupt occurred, since it could have nothing to do with the reason of the interrupt. As for the people mentioning "undefined state", that doesn't make any sense. See next comment... — Karim Manaouil, Jan 14 '22 at 21:41
The interrupted process will always be in very well defined state that could be resumed later. Processes that access data that could also be accessed in interrupt context should disable interrupts in the first place before accessing such data to protect against corruption. — Karim Manaouil, Jan 14 '22 at 21:43
In general it's a design choice to avoid penalising the interrupted process and to encourage handling the bulk of the work in a well defined process context that is related to the source of the interrupt (for example, the disk IO thread). This leads to the bottom-half and top-half design used in Linux, where a quick minimal interrupt acknowledgement is done in the interrupt handler and then the rest of the handling is scheduled to be done in process context. — Karim Manaouil, Jan 14 '22 at 21:49
Also, very often, due to re-entrancy requirements, interrupt handlers run with interrupts disabled and thus spending a long time in an interrupt handler will hurt latency and system responsiveness since the chance of missing interrupts is higher. Thus, it is preferred to have very quick interrupt handlers for interrupts to be re-enabled quickly. — Karim Manaouil, Jan 14 '22 at 21:54

score 3 · Answer 5 · answered Jun 27 '09 at 22:52

3

So what stops the scehduler from putting interrupt context to sleep and taking next schedulable process and passing it the control?

Scheduling happens on timer interrupts. The basic rule is that only one interrupt can be open at a time, so if you go to sleep in the "got data from device X" interrupt, the timer interrupt cannot run to schedule it out.

Interrupts also happen many times and overlap. If you put the "got data" interrupt to sleep, and then get more data, what happens? It's confusing (and fragile) enough that the catch-all rule is: no sleeping in interrupts. You will do it wrong.

answered Jun 27 '09 at 22:52

Andres Jaan Tack

22,566
11
59
78

1

I dont agree that "basic rule is only one interrupt". A interrupts can be nested. Please refer to bovet cesati, Chapter 4.3, "Nested execution of Exception and Interrupt Handlers" – Methos Jun 29 '09 at 20:26
You have a good point, but note the next paragraph (they can overlap, what you call nested). It's a "basic" rule because if you do anything otherwise, you'd better know what's going on. – Andres Jaan Tack Jun 29 '09 at 21:58

score 2 · Answer 6 · answered Jul 30 '09 at 17:56

2

Even if you could put an ISR to sleep, you wouldn't want to do it. You want your ISRs to be as fast as possible to reduce the risk of missing subsequent interrupts.

answered Jul 30 '09 at 17:56

Ryan Fox

10,103
5
38
48

samchen2009 · Answer 7 · 2013-09-27T11:10:14.713

By nature, the question is whether in interrupt handler you can get a valid "current" (address to the current process task_structure), if yes, it's possible to modify the content there accordingly to make it into "sleep" state, which can be back by scheduler later if the state get changed somehow. The answer may be hardware-dependent.

But in ARM, it's impossible since 'current' is irrelevant to process under interrupt mode. See the code below:

#linux/arch/arm/include/asm/thread_info.h 
94 static inline struct thread_info *current_thread_info(void)
95 {
96  register unsigned long sp asm ("sp");
97  return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
98 }

sp in USER mode and SVC mode are the "same" ("same" here not mean they're equal, instead, user mode's sp point to user space stack, while svc mode's sp r13_svc point to the kernel stack, where the user process's task_structure was updated at previous task switch, When a system call occurs, the process enter kernel space again, when the sp (sp_svc) is still not changed, these 2 sp are associated with each other, in this sense, they're 'same'), So under SVC mode, kernel code can get the valid 'current'. But under other privileged modes, say interrupt mode, sp is 'different', point to dedicated address defined in cpu_init(). The 'current' calculated under these mode will be irrelevant to the interrupted process, accessing it will result in unexpected behaviors. That's why it's always said that system call can sleep but interrupt handler can't, system call works on process context but interrupt not.

score 1 · Answer 8 · answered Aug 03 '10 at 19:25

The linux kernel has two ways to allocate interrupt stack. One is on the kernel stack of the interrupted process, the other is a dedicated interrupt stack per CPU. If the interrupt context is saved on the dedicated interrupt stack per CPU, then indeed the interrupt context is completely not associated with any process. The "current" macro will produce an invalid pointer to current running process, since the "current" macro with some architecture are computed with the stack pointer. The stack pointer in the interrupt context may point to the dedicated interrupt stack, not the kernel stack of some process.

This makes sense and I thought about the "current" pointer in the case of the Linux kernel. But in which situations, the kernel uses the per CPU stacks instead of process kernel stack? — Karim Manaouil, Jan 14 '22 at 21:58

score 0 · Answer 9 · answered Jul 30 '09 at 17:50

High-level interrupt handlers mask the operations of all lower-priority interrupts, including those of the system timer interrupt. Consequently, the interrupt handler must avoid involving itself in an activity that might cause it to sleep. If the handler sleeps, then the system may hang because the timer is masked and incapable of scheduling the sleeping thread. Does this make sense?

score 0 · Answer 10 · answered Jul 30 '09 at 17:57

If a higher-level interrupt routine gets to the point where the next thing it must do has to happen after a period of time, then it needs to put a request into the timer queue, asking that another interrupt routine be run (at lower priority level) some time later.

When that interrupt routine runs, it would then raise priority level back to the level of the original interrupt routine, and continue execution. This has the same effect as a sleep.

score 0 · Answer 11 · answered Jan 31 '13 at 03:47

It is just a design/implementation choices in Linux OS. The advantage of this design is simple, but it may not be good for real time OS requirements.

Other OSes have other designs/implementations.

For example, in Solaris, the interrupts could have different priorities, that allows most of devices interrupts are invoked in interrupt threads. The interrupt threads allows sleep because each of interrupt threads has separate stack in the context of the thread. The interrupt threads design is good for real time threads which should have higher priorities than interrupts.

Why kernel code/thread executing in interrupt context cannot sleep?

11 Answers11

Linked