0

I have been running a nvidia docker image since 13 days and it used to restart without any problems using docker start -i <containerid> command. But, today while I was downloading pytorch inside the container, download got stuck at 5% and gave no response for a while.

I couldn't exit the container either by ctrl+d or ctrl+c. So, I exited the terminal and in new terminal I ran this docker start -i <containerid> again. But ever since this particular container is not responding to any command. Be it start/restart/exec/commit ...nothing! any command with this container ID or name is just non-responsive and had to exit out of it only after ctrl+c

I cannot restart the docker service since it will kill all running docker containers. Cannot even stop the container using this docker container stop <containerid>

Please help.

Suvidha
  • 429
  • 1
  • 6
  • 17
  • 1
    I'd typically expect `docker stop container_id` to work; if it doesn't, you may be down to more extreme measures like rebooting the host or restarting Docker Desktop. (IME it's unusual to use `docker start` at all; once you've stopped the container, `docker rm` it and `docker run` a new one.) – David Maze Sep 07 '22 at 10:37
  • I have made many installs on that image that is why cannot remove it and reinstall everything again. – Suvidha Sep 07 '22 at 11:07
  • I'm not clear what you mean. Is your image's `Dockerfile` checked into source control? You can always `docker build` it again when you need to. – David Maze Sep 07 '22 at 11:19
  • I used docker for the first time and tried many things unless this container for nvidia/tensorflow1 nvcr.io/nvidia/tensorflow:22.01-tf1-py3 worked for me. I started using it from simple commands like docker pull and docker run and then to restart I used docker start. I don't have any dockerfile associated with it. – Suvidha Sep 07 '22 at 11:32
  • 2
    If you've `docker run` a container and are installing software in that running container, you've done the equivalent of booting a system to only a RAM disk, and anything you've done will be lost when the container exits. You _are_ going to lose data with this setup. I'd highly recommend building a reproducible `Dockerfile` while the memory of what you've done is still fresh in your mind. – David Maze Sep 07 '22 at 11:42

3 Answers3

1

I had to restart docker process to revive my container. There was nothing else I could do to solve it. used sudo service docker restart and then revived my container using docker run. I will try to build the dockerfile out of it in order to avoid future mishaps.

Suvidha
  • 429
  • 1
  • 6
  • 17
0

You can make use of docker RestartPolicy: docker update --restart=always <container> while mindful of caveats on the docker version you running.

or explore an answer by @Yale Huang from a similar question: How to add a restart policy to a container that was already created

katxeus
  • 11
  • 4
  • I tried this command but not helpful. As I said, any command with this container id in it argument is gonna get stuck. Is there anyway I can kill that particular docker image but still be able to restart it again. – Suvidha Sep 07 '22 at 09:50
  • Like I have linked in the answer, there is an option by @Yale Huang to `docker commit` your container as a new image, stop & rm the current container, and start a new container with the image. – katxeus Sep 07 '22 at 10:25
  • I tried with docker commit, still it did not help. The command I used: docker commit suvidha/test_image – Suvidha Sep 07 '22 at 11:01
0

Have look at this...

As mentioned in my comment: I see significant performance differences between Hyper-V and WSL. Hyper-V seems to have much faster IO access than WSL (both WSL vers. 1 and WSL vers. 2) which speeds up builds among others.

Jess N
  • 11
  • 2
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 15 '23 at 12:21