23

Most of docker commands never end. I have to interrupt them manually with CTRL+C. Even simple commands like docker ps or docker info do not respond.

However, docker help and docker version still work.

I think there is something like a deadlock with a particular container, so commands related to containers won't complete.

How to handle such a situation ?


My docker version is 1.12.3. I don't use Swarm mode. The docker logs command doesn't work too. Using dmesg I can see a lot of I/O errors, but I don't know if it is related with my problem:

[12898.121287] loop: Write error at byte offset 8882749440, length 4096.
[12898.122837] loop: Write error at byte offset 8883666944, length 4096.
[12898.124685] loop: Write error at byte offset 8882814976, length 4096.
[12898.126459] loop: Write error at byte offset 8883404800, length 4096.
[12898.128201] loop: Write error at byte offset 8883470336, length 4096.
[12898.129921] loop: Write error at byte offset 8883535872, length 4096.
[12898.131774] loop: Write error at byte offset 8883601408, length 4096.
[12898.133594] loop: Write error at byte offset 8883732480, length 4096.
[12917.269786] loop: Write error at byte offset 8883798016, length 4096.
[12917.270331] quiet_error: 632 callbacks suppressed
[12917.270334] Buffer I/O error on device dm-6, logical block 1313320
[12917.270540] lost page write due to I/O error on dm-6
[12917.270543] Buffer I/O error on device dm-6, logical block 1313321
[12917.270740] lost page write due to I/O error on dm-6
[12917.270742] Buffer I/O error on device dm-6, logical block 1313322
[12917.270957] lost page write due to I/O error on dm-6
[12917.270959] Buffer I/O error on device dm-6, logical block 1313323
[12917.271177] lost page write due to I/O error on dm-6
[12917.271179] Buffer I/O error on device dm-6, logical block 1313324
[12917.271377] lost page write due to I/O error on dm-6
[12917.271379] Buffer I/O error on device dm-6, logical block 1313325
[12917.271573] lost page write due to I/O error on dm-6
[12917.301759] loop: Write error at byte offset 8883863552, length 4096.
[12917.312038] loop: Write error at byte offset 8883929088, length 4096.
[12917.312396] Buffer I/O error on device dm-6, logical block 1313328
[12917.312635] lost page write due to I/O error on dm-6
[12917.312638] Buffer I/O error on device dm-6, logical block 1313329
[12917.312867] lost page write due to I/O error on dm-6
[12917.312869] Buffer I/O error on device dm-6, logical block 1313330
[12917.313121] lost page write due to I/O error on dm-6
[12917.313123] Buffer I/O error on device dm-6, logical block 1313331
[12917.313346] lost page write due to I/O error on dm-6
[13090.853726] INFO: task kworker/u8:0:17212 blocked for more than 120 seconds.
[13090.854055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Using the command sudo systemctl status -l docker, the following messages are printed, but I cannot tell if they are related:

dockerd[1344]: time="2016-11-24T17:49:01.184874648+01:00" level=warning msg="libcontainerd: container c9f35af1836bf856001ca6156663f713c1217a697e8d2451927c67797fb5a770 restart canceled"
dockerd[1344]: time="2016-11-24T17:49:02.627116016+01:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]"
dockerd[1344]: time="2016-11-24T17:49:02.627152661+01:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
dockerd[1344]: time="2016-11-24T18:19:51.472701647+01:00" level=warning msg="libcontainerd: container c9f35af1836bf856001ca6156663f713c1217a697e8d2451927c67797fb5a770 restart canceled"
dockerd[1344]: time="2016-11-24T18:19:56.712126199+01:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]"
dockerd[1344]: time="2016-11-24T18:19:56.712159759+01:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
dockerd[1344]: time="2016-11-24T18:34:24.301786606+01:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers : [nameserver 8.8.8.8 nameserver 8.8.4.4]"
dockerd[1344]: time="2016-11-24T18:34:24.302208751+01:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers : [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]"
RotS
  • 2,142
  • 2
  • 24
  • 30
  • 2
    We don't know your containers so we can't help you. –  Nov 25 '16 at 11:01
  • Can you provide more details on how you setup your docker daemon? For instance are you running Swarm mode with 1.12.3? How many Managers are you running? If only one locally, what are the logs saying? etc. – abronan Nov 25 '16 at 12:22
  • @abronan I edited to add further information. I hope it will help. – RotS Nov 25 '16 at 14:03
  • 4
    This is a legit general situation when the Docker Daemon has crashed. It should have a specific answer about how to restart/kill the process. – xer0x Mar 10 '17 at 00:06

6 Answers6

12

That Docker commands hanging bug happened after I deleted a container.

The daemon dockerd was in an abnormal state: it couldn't be started (sudo service docker start) after having been stopped (service docker stop).

# sudo service docker start
Redirecting to /bin/systemctl start docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.

# journalctl -xe
kernel: device-mapper: ioctl: unable to remove open device docker-253:0-19468577-d6f74dd67f106d6bfa483df4ee534dd9545dc8ca
...
systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Docker Application Container Engine.
systemd[1]: Unit docker.service entered failed state.
systemd[1]: docker.service failed.
polkitd[896]: Unregistered Authentication Agent for unix-process:22551:34177094 (system bus name :1.290, object path /org
ESCESC
kernel: dev_remove: 41 callbacks suppressed
kernel: device-mapper: ioctl: unable to remove open device docker-253:0-19468577-fc63401af903e22d05a4518e02504527f0d7883f9d997d7d97fdfe72ba789863
...
dockerd[22566]: time="2016-11-28T10:18:09.840268573+01:00" level=fatal msg="Error starting daemon: timeout"
systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
systemd[1]: Failed to start Docker Application Container Engine.

Moreover, many zombie Docker processes could be observed using ps -eax | grep docker (presence of a "Z" in the "STAT" column), for example docker-proxies.

After rebooting the server and restarting Docker, the zombie processes disappeared and Docker commands were working again.

RotS
  • 2,142
  • 2
  • 24
  • 30
5

I just had a similar issue as well. Rebooting the server did not work for me. I got this issue, because I just installed a new container with some kind of errors. After that, most Docker commands did not respond. I fixed it by executing the following command:

docker system prune -a

This removes all unused containers. In my case also the container I just added. More information:

https://docs.docker.com/engine/reference/commandline/system_prune/

Waxyen Flax
  • 101
  • 3
  • 9
0

I had the same problem (commands not responding) and I fix it by increasing the resources allocated to Docker.

Docker Desktop -> Preferences -> Advanced

In my case, I increased:

  • Memory from 2GB to 8GB
  • Swap from 1GB to 2GB

Try different values according with your machine.

Bruno Carneiro
  • 401
  • 3
  • 14
0

From the symptoms that you present, it seems something I struggled as well. I did the following, hope it helps!

After checking it the service was not responding successfully, using:

system status docker.service

I used the following command to put it to work:

sudo dockerd --debug
xalves
  • 320
  • 1
  • 18
0

I uninstalled Docker and reinstalled it, and everything seems to be working again

Liku45
  • 1
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 02 '23 at 06:16
-3

Restarting my PC worked for me