4

I would like to know how the Docker containers are managed by the operating system. I could find some explanations here: https://stackoverflow.com/a/47784145/11377751, but I can not understand well.

I know the basics of the operating system (notions about PCB, ram, CPU, kernel, syscall, etc.).

But I do not understand how the containers are managed by the OS:

  • Are containers processes or are they "emulated" by the Docker Engine (which would be the only process, with its threads)? How can I represent a container in the ram ?
  • If, in my container, I launch a C application that contains "fork();", who makes the call to the kernel: the container or the Docker Engine? Who duplicates the pcb ? Or a "malloc (..);"?
  • What is the concept of namespace in the kernel? in the ram? Are these tables that define access rights or something? Why does wikipedia say that this notion is essential for containers, knowing that in schemas Docker Engine is represented between containers and the kernel?

image :

This is an image I found here : https://stackoverflow.com/a/42111368/11377751


thank you very much in advance

tgogos
  • 23,218
  • 20
  • 96
  • 128
babz
  • 61
  • 5

1 Answers1

4

Preface: this was tested on Arch Linux

  1. Containers are processes. We have dockerd and containerd - each of them only once. Then we have docker and containerd-shim processes for each container, that is running. Notice, that containerd-shim is parent of container. It's purpose is to:

    • First it allows the runtimes, i.e. runc,to exit after it starts the container. This way we don't have to have the long running runtime processes for containers. When you start mysql you should only see the mysql process and the shim.

    • Second it keeps the STDIO and other fds open for the container incase containerd and/or docker both die. If the shim was not running then the parent side of the pipes or the TTY master would be closed and the container would exit.

    • Finally it allows the container's exit status to be reported back to a higher level tool like docker without having the be the actual parent of the container's process and do a wait4.

( taken from https://groups.google.com/forum/#!topic/docker-dev/zaZFlvIx1_k )

  1. Please read 2 good answers about system calls ( https://stackoverflow.com/a/32842491/5247040 ) and roles of docker's parts ( https://stackoverflow.com/a/46650343/5247040 ). Another good read https://medium.com/devopslinks/docker-containerd-standalone-runtimes-heres-what-you-should-know-b834ef155426

EDIT: Quick answer: C application calls fork and malloc => Linux kernel. I was wrong, Docker Engine doesn't participate in syscalls, all control happens via namespaces / seccomp ( https://stackoverflow.com/a/34871045/5247040 )

  1. Representations of namespace in ram are quite different, you can check source code for

Why does wikipedia say that this notion is essential for containers

Because "Various container software use Linux namespaces in combination with cgroups to isolate their processes, including Docker[8] and LXC" ( https://en.wikipedia.org/wiki/Linux_namespaces )

Basically namespaces are the instrument, with which Docker Engine controls resources of containers

Roman Zaitsev
  • 1,328
  • 5
  • 20
  • 28
  • `C application calls fork and malloc => Docker Engine => Linux kernel` - could you elaborate this more thoroughly to prove this statement? How it works in ordinary setup and how it works when the Docker Engine is stopped (live-restore mode)? – Danila Kiver Apr 18 '19 at 12:14