We are using docker within a swarm environnement. Everything is fine... but for a strange process named "exe" that appeared, a few days ago :
14126 root 20 0 446836 33648 184 R 49.0 0.2 0:05.98 exe
1 root 20 0 52356 532 332 S 34.3 0.0 2750:22 systemd
13789 root 20 0 5424660 49784 0 S 5.6 0.3 2381:57 dockerd
It did take up to 100% of the CPU.
We tried to understand where it came from, but it was very volatile, and its pid changed every 3-4s.
You can guess that such a behaviour triggered a few alarms.
Eventually, we set up a few monitoring tools (using auditd) to take a snapshot of it, and saw that :
Syscall event curl /usr/bin/curl 24242 24234
Syscall event 4 / 24240 24234
Syscall event exe /usr/bin/runc 24240 24234
Syscall event runc /usr/bin/runc 24234 10444
The parent process of the "main" runc is :
root 10444 2621 0 Nov13 ? 00:07:07 containerd-shim
I read a few things (including this one and that other one, and many more) about containerd-shim and runc... I think I understand runc is used to launch demonless containers, and then containerd-shim takes over as the container process' parent .
Thus, I understand why I see briefly runc as containerd-shim' child process everytime a container is started.
But there are still a few things that still escape me :
- why are there are several levels of runc (one runc calling another)?
- why is it not called "runc" but "exe", and thus looking very suspicious (when it sounds like it is legitimate)? Is it the mainprocess of the container (or another one)?
- what is this strange process called "4" and whose executable path is "/"? Is it part of the processes in the container (or the main one)?
- I guess the curl is the healthcheck performed in the container (it's an apache container with a healtcheck targeting localhost). Am I right?
- Provided the main process of the container is not the "4" one, should I see it and how could I see it in a similar way?
In the meantime, the process has just stopped using the whole cpu. It appears brievely (but sounds legitimate) every time a container is started, but does not take more than a few percents. So I think the excessive CPU usage of it was related to some problem in our container. Anyway, solving the problem of cpu was not my point here.
Edit 1 :
About dockerfiles
There are a lot of container running on the VM, and I can't provide all Dockerfiles. The one I suspect is triggering the curl through healthcheck is an apache httpd (centOs Based) image. It is very close to the CentOS one with mainly some labeling, cleaning (unused modules), and an additionnal healthckeck:
HEALTHCHECK --interval=5s --timeout=3s CMD curl --noproxy '*' --fail http://localhost:80/ || exit 1
About monitoring
we are using rsyslog with a basic conf targeting a remote server, and then launch auditctl to monitor process triggering :
service rsyslog restart
service auditd start
auditctl -a always,exit -F arch=b64 -S execve -F key=procmon