Docker swarm not mapping ports correctly - connectivity issues

Question

I'm following the docker tutorial and I'm on part 4: swarms that involves setting up a swarm across two vm's. I think swarm is not mapping my container ports correctly. When I launch an app with swarm I cannot connect to my app when I try to visit the IP address provided by my VM. However, if I manually launch the app while SSH'd into the vm then I can access it.

The reason why I think swarm isn't mapping the ports correctly is because when I deployed the app via swarm I tried to view the port mappings of a container by doing something like docker port CONTAINER_NAMEnothing would be shown. After manually deploying and I run that command for a specified container I see something like 80/tcp -> 0.0.0.0:80

For example, here's what happens when I try to use curl

curl 192.168.99.100:80
curl: (7) Failed to connect to 192.168.99.100 port 80: Connection refused

I made this post a while back that goes into more detail, but that went unanswered and I think the new information I have might be more helpful.

What doesn't work

When I try to deploy w/ swarm docker stack deploy -c docker-compose.yml getstartedlab I cannot connect to the app via a web browser or curl. I believe everything was deployed correctly because I can run docker stack ps getstartedlab and view all of the services running and distributed between my two vm's.

What does work

I took down the stack with docker stack rm getstartedlab then I ssh'd into the two vm's I created and manually launched the app specifying the port mapping like so:

docker run -p 80:80 -td myusername/get-started:part2

Then browsing to 192.168.99.100 I was able to see the app and no longer had the connection issues. I'm pretty sure this means it's a Docker issue, not a VM issue.

Other info - maybe useful

Here's my docker-compose.yml file. I tried the ports with both "4000:80" and "80:80"

version: "3"
services:
  web:
    image: myusername/get-started:part2
    deploy:
      replicas: 5
      resources:
        limits:
          cpus: "0.1"
          memory: 50M
      restart_policy:
        condition: on-failure
    ports:
      - "4000:80"
    networks:
      - webnet
networks:
  webnet:

To sum it up I cannot access the app when I try to launch it with swarm. If I manually launch from within the VM and specify the port mapping (not using swarm) I can then access the app from the vm's IP.

Updates

I removed the running stack, restarted the VM's and relaunched with docker stack deploy -c docker-compose.yml getstartedlab

$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                         PORTS
4kn46vue9ka3        getstartedlab_web   replicated          5/5                 myusername/get-started:part2   *:4000->80/tcp

$ docker service ps 4kn46vue9ka3
ID                  NAME                  IMAGE                         NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
a34u4jijc5d6        getstartedlab_web.1   myusername/get-started:part2   myvm1               Running             Running 20 minutes ago                       
8fvukpo95ko4        getstartedlab_web.2   myusername/get-started:part2   myvm2               Running             Running 20 minutes ago                       
lywca2zwtjfa        getstartedlab_web.3   myusername/get-started:part2   myvm2               Running             Running 20 minutes ago                       
u3cw40tjmujb        getstartedlab_web.4   myusername/get-started:part2   myvm1               Running             Running 20 minutes ago                       
c0tiyxu5o5x5        getstartedlab_web.5   myusername/get-started:part2   myvm2               Running             Running 20 minutes ago

Here are the logs

$ docker service logs 4kn46vue9ka3
getstartedlab_web.4.u3cw40tjmujb@myvm1    |  * Serving Flask app "app" (lazy loading)
getstartedlab_web.4.u3cw40tjmujb@myvm1    |  * Environment: production
getstartedlab_web.4.u3cw40tjmujb@myvm1    |    WARNING: Do not use the development server in a production environment.
getstartedlab_web.4.u3cw40tjmujb@myvm1    |    Use a production WSGI server instead.
getstartedlab_web.4.u3cw40tjmujb@myvm1    |  * Debug mode: off
getstartedlab_web.4.u3cw40tjmujb@myvm1    |  * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
getstartedlab_web.1.a34u4jijc5d6@myvm1    |  * Serving Flask app "app" (lazy loading)
getstartedlab_web.1.a34u4jijc5d6@myvm1    |  * Environment: production
getstartedlab_web.1.a34u4jijc5d6@myvm1    |    WARNING: Do not use the development server in a production environment.
getstartedlab_web.1.a34u4jijc5d6@myvm1    |    Use a production WSGI server instead.
getstartedlab_web.1.a34u4jijc5d6@myvm1    |  * Debug mode: off
getstartedlab_web.1.a34u4jijc5d6@myvm1    |  * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
getstartedlab_web.3.lywca2zwtjfa@myvm2    |  * Serving Flask app "app" (lazy loading)
getstartedlab_web.3.lywca2zwtjfa@myvm2    |  * Environment: production
getstartedlab_web.3.lywca2zwtjfa@myvm2    |    WARNING: Do not use the development server in a production environment.
getstartedlab_web.3.lywca2zwtjfa@myvm2    |    Use a production WSGI server instead.
getstartedlab_web.3.lywca2zwtjfa@myvm2    |  * Debug mode: off
getstartedlab_web.3.lywca2zwtjfa@myvm2    |  * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
getstartedlab_web.2.8fvukpo95ko4@myvm2    |  * Serving Flask app "app" (lazy loading)
getstartedlab_web.2.8fvukpo95ko4@myvm2    |  * Environment: production
getstartedlab_web.2.8fvukpo95ko4@myvm2    |    WARNING: Do not use the development server in a production environment.
getstartedlab_web.2.8fvukpo95ko4@myvm2    |    Use a production WSGI server instead.
getstartedlab_web.2.8fvukpo95ko4@myvm2    |  * Debug mode: off
getstartedlab_web.2.8fvukpo95ko4@myvm2    |  * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |  * Serving Flask app "app" (lazy loading)
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |  * Environment: production
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |    WARNING: Do not use the development server in a production environment.
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |    Use a production WSGI server instead.
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |  * Debug mode: off
getstartedlab_web.5.c0tiyxu5o5x5@myvm2    |  * Running on http://0.0.0.0:80/ (Press CTRL+C to quit)

I ran this from both my local machine and within the vm, the results were the same

$ docker container exec getstartedlab_web.1.a34u4jijc5d69m7rqry1jpfo9 curl http://127.0.0.1
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"curl\": executable file not found in $PATH": unknown

I also ran the swarm with only one node, still no luck connecting to 192.168.99.100/

$ docker-machine ssh myvm2
docker@myvm2:~$ docker swarm leave

Then back on my local machine

$ docker-machine stop myvm2
$ docker stack deploy -c docker-compose.yml getstartedlab
$ docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
gjkde7aw2rznl9y68zti3lqrj *   myvm1               Ready               Active              Leader              18.09.0
qu6qymppl1msxatll9m0sh7tn     myvm2               Down                Active                                  18.09.0
$ docker stack ps getstartedlab
ID                  NAME                  IMAGE                         NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
vr0w3riw1cnz        getstartedlab_web.1   myusername/get-started:part2   myvm1               Running             Running 22 seconds ago                       
pvr93rqi64dw        getstartedlab_web.2   myusername/get-started:part2   myvm1               Running             Running 22 seconds ago                       
gucfa7asiwvx        getstartedlab_web.3   myusername/get-started:part2   myvm1               Running             Running 22 seconds ago                       
qzarr6jc5hzk        getstartedlab_web.4   myusername/get-started:part2   myvm1               Running             Running 22 seconds ago                       
p0hszupsl8wj        getstartedlab_web.5   myusername/get-started:part2   myvm1               Running             Running 22 seconds ago

I also checked the logs and there was no sign of myvm2 anywhere, as expected.

More Updates

I removed all images, containers, vm's, etc... and started the tutorial over from the start. To ensure that no old code/configurations were being used I named the vm myvm3 and the image is now getting-started-again. This time I only launched one vm as a single node cluster. Still no luck, I'm having the same "connection refused" error

$ docker-machine ls
NAME    ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER     ERRORS
myvm3   *        virtualbox   Running   tcp://192.168.99.102:2376           v18.09.0   


$ docker stack ps getstartedlab
ID                  NAME                  IMAGE                               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
hnbqdo9k3val        getstartedlab_web.1   myusername/get-started-again:part2   myvm3               Running             Running 7 minutes ago                       
bwc0ga954poo        getstartedlab_web.2   myusername/get-started-again:part2   myvm3               Running             Running 7 minutes ago                       
aleioewe4ivx        getstartedlab_web.3   myusername/get-started-again:part2   myvm3               Running             Running 7 minutes ago                       
fzys3tihrf0t        getstartedlab_web.4   myusername/get-started-again:part2   myvm3               Running             Running 7 minutes ago                       
4jyzzao11z96        getstartedlab_web.5   myusername/get-started-again:part2   myvm3               Running             Running 7 minutes ago

There are a lot of possibilities here, but in general Swarm is suprisingly mature and reliable. I would try confirming all your containers are running correctly, and older containers have stopped correctly. Swarm is slightly different in that there are all kinds of errors it won't force you to consider when running deploy. You have to look for them using `docker service ps` and `docker service logs`. Try shrinking your swarm to a single node. Try running a single replica. Try running `docker exec getstartedlab_web..... curl http://127.0.0.1/` to confirm your app is not the issue. — Ryan, Dec 30 '18 at 23:44
@Ryan I've updated my post with the service and logs you requested. I'm not sure if I used the correct options for the exec command, but it's included in the edits. I'm going to try running a single node now — user10194756, Dec 31 '18 at 01:54
@Ryan I tried running swarm with only one node and it still didn't work. I updated my post with the steps I took. — user10194756, Dec 31 '18 at 02:33
Do you still have your VirtualBox NAT modifications? I wonder if that is causing you issues. — Ryan, Dec 31 '18 at 04:41
Looks like that image doesn't have curl installed, but everything else in the logs looks OK. I'd suggest starting with a fresh VM, and deploying the stack on a single-node cluster and confirming that works how you want. Then adding the second node to the cluster. I think that will isolate your issue. — Ryan, Dec 31 '18 at 04:45
can you please clarify which is docker swarm master in these machines? Also is `webnet` user define overlay network with any additional configs? — Mani, Dec 31 '18 at 06:56
@Mani myvm1 is the swarm master. Sorry, I'm not sure what you mean regarding `webnet` user define overlay network. I haven't changed any configurations if that's what you're asking. I've followed the tutorial 100%, I haven't changed anything. — user10194756, Dec 31 '18 at 23:20
@Ryan I removed all images, containers, vm's, etc... and started the tutorial completely over from the start. I have one newly installed VM running a single node cluster and it's still giving me the same connection issues. I haven't changed any configurations or code in the tutorial. — user10194756, Jan 01 '19 at 00:21
seems the issue is with the network. You are using `webnet` as the network. How did you create it? did you run `docker network create -d overlay webnet` — Mani, Jan 01 '19 at 14:52
@Mani I followed the official tutorial that I linked at the top. It very briefly touches on webnet and says that its defined in the docker-compose.yml file. [Here's the section of the tutorial that covers webnet](https://docs.docker.com/get-started/part3/#docker-composeyml). I did exactly what the tutorial said to do. — user10194756, Jan 01 '19 at 19:18
Did you get those connection issues with a single node cluster? What does `docker node ls` say? — Ryan, Jan 04 '19 at 02:27
@Ryan Yes, I had the connection issues with a single node cluster. Every single docker command is reporting that it's operating as intentended. Whether I run `docker container ls`, `docker-machine ls`, `docker node ls`, they all report what appears to be a healthy swarm. — user10194756, Jan 11 '19 at 15:52
This post is very old but I ran into a similar issue when the docker-swarm ports are not open between the nodes. You can `sudo firewall-cmd --add-service=docker-swarm --permanent` on each node then `sudo firewall-cmd --reload` and you should be working better. — duct_tape_coder, Aug 31 '23 at 22:18

Docker swarm not mapping ports correctly - connectivity issues

0 Answers0