35

I have an issue with some processes stuck in a D state on Ubuntu 10.04.3 LTS.

They have been in this state since Nov the 5th (today being December 6th). I understand these are uninterruptible sleep states often related to waiting for data from hardware such as a hard disk. This is a production server so rebooting is a very last resort, is anyone able to shed any light on what these processes might be?

This is the output for the D state items from ps -aux

www-data 22851  0.0  0.0      0     0 ?        D    Nov05   0:00 [2637.64]
www-data 26306  0.0  0.0   4008    12 ?        D    Nov05   0:00 ./2.6.37
www-data 26373  0.0  0.0   4008    12 ?        D    Nov05   0:00 ./n2
www-data 26378  0.0  0.0   4008    12 ?        D    Nov05   0:00 ./n2

This is output of ps axl | awk '$10 ~ /D/' for a little more info.

0    33 22851     1  20   0      0     0 econet D    ?          0:00 [2637.64]
1    33 26306     1  20   0   4008    12 ec_dev D    ?          0:00 ./2.6.37
1    33 26373     1  20   0   4008    12 ec_dev D    ?          0:00 ./n2
1    33 26378     1  20   0   4008    12 ec_dev D    ?          0:00 ./n2

Is there a way to kill these? Does having processes in this state when rebooting cause any issues?

sebastian-c
  • 15,057
  • 3
  • 47
  • 93
Dr Craig
  • 371
  • 1
  • 3
  • 6
  • 1
    Imho it wasn't a good deed, to hold this question. You had it to tramsfer to stackoverflow. – peterh Dec 07 '13 at 12:59

2 Answers2

25

This is the dreaded un-interruptible (TASK_UNINTERRUPTIBLE) state of a process. This is the state where the process doesn't react to signals until what it started to wait for, gets done. Unfortunately it is a necessary evil. See here and here What is an uninterruptable process?.

My answer is to reboot the system.
Do rebooting cause any issues ?
Hard to tell, it may it may not. The process which is in the D state may have to do some crucial updates which it wont if you reboot.
If you really cant afford to reboot, try to find the disk on which the process is waiting and see if the disk is working fine by opening, closing, reading/writing into it

Community
  • 1
  • 1
Deepthought
  • 2,815
  • 2
  • 28
  • 39
7

No - you cannot kill them, period. kill -9 does not work either. And it is not kernel bug, it is by design. All signals are blocked until those processes leave the D state. They either leave the D state or the system gets rebooted. No, rebooting does not have any problem with these guys.

The usual culprits for this kind of problem are removable media devices like a cdrom. The device may be defective or somebody found a way to do something stupid.

jim mcnamara
  • 16,005
  • 2
  • 34
  • 51
  • 6
    No, if it can't be stucked out from "D" state, although it is possible (just like in windows), that is clearly not normal. If it were by design, then it were a faulty design. Although I didn't found such problems since years, before that caused me a lot of trouble. I think, the kernel guys solved this problem in most cases. And yes, reboot helps - at least until the next "D"-state problem. – peterh Dec 06 '13 at 16:52
  • 1
    Maybe I do not understand what you mean, but all signals including SIGKILL are blocked, by the kernel, on purpose, from any state "D" process in modern Linuxes. Which is a good thing. Because the process is waiting on a driver, which is waiting on hardware. – jim mcnamara Dec 06 '13 at 17:54
  • 21
    And thus a simple, solvable hardware or locking problem will escalate to a major problem, needing reboot. And the kernel people doesn't even understand, that it is a… a… suboptimal handling of the things. – peterh Dec 06 '13 at 18:06
  • 3
    @jimmcnamara Pretty stupid if the HW is non-existent by the time. I would have given this a -1 if it were the kernel author posting this answer. – yo' May 23 '17 at 22:29
  • 1
    It's 2019 and that "necessary evil" is still around. That's embarrassing. – some-non-descript-user Mar 22 '19 at 09:25
  • Most hardware drivers should have some kind of timeout for the hardware and will result in some kind of IO Error for the caller. However, some protocols hang by design - for example, NFS has no timeout by design and the expected behavior is to stall processes for say 2 weeks waiting for NFS server RAID controller to be replaced. That was considered a nice feature long long time ago when it was designed. Nowadays, it's possible to declare a timeout for NFS mounts but that needs to be done before it hangs. – Mikko Rantalainen Apr 21 '20 at 10:08