We are using docker swarm to deploy a stack of a bunch of services. We name the stack 'gw'. Our script runs
docker stack deploy -resolve-image changed --with-registry-auth –prune -c docker-compose.yml -c docker-compose.override.yml gw
the docker-compose defines our base services (3 or 4) and the compose.override can contain a lot more: we have an app concept where each app get installed as one or many docker containers. The whole is working fine in 90% of the deployments: the stack get deployed and the respective services get started.
However, in some rare cases (the 10%), e.g. if the image has some issue, the stack get successfully deployed BUT the service is failing to start.
Then, we can look at the service log to see what happened.
docker service log gw_myservice
In even more rare cases (let's say 1%), the service is however crashing BEFORE logging anything.
We can have a closer look at the service with
docker inspect gw_myservice
and
docker service ps gw_myservice --no-trunc
I haven't try the latter yet, but I suppose this is similar to what an other component gave us. We are indeed using the cockpit UI on our linux OS where the docker stack get deployed. The "Docker containers" view gave us in one case an "exit code 128". According to the docker docs and this question, Docker follows standard chroot exit codes defined here. The "Invalid argument to exit" does not help much further.
There are a few similar questions e.g. here, here and there but my question is more generic: what to do if all tools fail to bring any more information? This other question points to the requested feature to add docker stack logs but it is still open. Answers there suggest to aggregate the output of 'docker service logs' but a service does NOT log anything, it will not help further.
The only option I still see is to remove the service from the stack and start it manually with docker run with all options (name, network, volumes, env variables, etc): quite painful but doable. It seems that a docker run gives more hint than when started from the stack: why? or how could we get the docker services to be as verbose?
Any further ideas are welcome!