14

I've got some strange issue. I have following setup: one docker-host running traefik as LB serving multiple sites. sites are most php/apache. HTTPS is managed by traefik. Each site is started using a docker-compose YAML containing the following:

version: '2.3'
services:
  redis:
    image: redis:alpine
    container_name: ${PROJECT}-redis
    networks:
      - internal
  php:
    image: registry.gitlab.com/OUR_NAMESPACE/docker/php:${PHP_IMAGE_TAG}
    environment:
      - APACHE_DOCUMENT_ROOT=${APACHE_DOCUMENT_ROOT}
    container_name: ${PROJECT}-php-fpm
    volumes:
       - ${PROJECT_PATH}:/var/www/html:cached
       - .docker/php/php-ini-overrides.ini:/usr/local/etc/php/conf.d/99-overrides.ini
    ports:
      - 80
    networks:
      - proxy
      - internal
    labels:
      - traefik.enable=true
      - traefik.port=80
      - traefik.frontend.headers.SSLRedirect=false
      - traefik.frontend.rule=Host:${PROJECT}
      - "traefik.docker.network=proxy"

networks:
  proxy:
    external:
      name: proxy
  internal:

(as PHP we use 5.6.33-apache-jessie or 7.1.12-apache f.e.)

Additionally to above, some sites get following labels:

traefik.docker.network=proxy
traefik.enable=true
traefik.frontend.headers.SSLRedirect=true
traefik.frontend.rule=Host:example.com, www.example.com
traefik.port=80
traefik.protocol=http

what we get is that some requests end in 502 Bad Gateway traefik debug output shows:

time="2018-03-21T12:20:21Z" level=debug msg="vulcand/oxy/forward/http: Round trip: http://172.18.0.8:80, code: 502, Length: 11, duration: 2.516057159s"

can someone help with that? it's completely random when it happens our traefik.toml:

debug = true
checkNewVersion = true
logLevel = "DEBUG"

defaultEntryPoints = ["https", "http"]
[accessLog]

[web]
address = ":8080"

[web.auth.digest]
users = ["admin:traefik:some-encoded-pass"]

[entryPoints]
  [entryPoints.http]
  address = ":80"
#    [entryPoints.http.redirect] # had to disable this because HTTPS must be enable manually (not my decission)
#      entryPoint = "https"
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]


[retry]

[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "example.com"
watch = true
exposedbydefault = false


[acme]
email = "info@example.com"
storage = "acme.json"
entryPoint = "https"
onHostRule = true

[acme.httpChallenge]
entryPoint = "http"

Could the issue be related to using the same docker-compose.yml?

x4k3p
  • 1,598
  • 2
  • 22
  • 42
  • Based on the number of views to this page (13k in 18 months) with the current number of upvotes (1) I'd suggest updating the question to something a little slimmer. – vhs Oct 02 '19 at 13:39
  • 7
    Big questions need love too. – Clintm Jan 27 '21 at 18:34

10 Answers10

19

Another reason can be that you might be accidentally mapping to the vm's port instead of the container port.

I made a change to my port mapping on the docker-compose file and forgot to update the labeled port so it was trying to map to a port on the machine that was not having any process attached to it

Wrong way:

ports:
  - "8080:8081"
labels:
  - "traefik.http.services.front-web.loadbalancer.server.port=8080"

Right way

ports:
  - "8080:8081"
labels:
  - "traefik.http.services.front-web.loadbalancer.server.port=8081"

Also in general don't do this, instead of exposing ports try docker networks they are much better and cleaner. I made my configuration documentation like a year ago and this was more of a beginner mistake on my side but might help someone :)

Mehdi Amenein
  • 937
  • 9
  • 23
14

For anyone getting the same issue:

After recreating the network (proxy) and restarting every site/container it seems to work now. I still don't know where the issue was from.

x4k3p
  • 1,598
  • 2
  • 22
  • 42
  • 1
    The only thuth for me is: only remove and add the network do this job. After a lot of attempts like upgrade traefik version from 1.5 to 1.6.6, compare configs among other services, try different configs the unique fix for me was remove and create the docker network. I don't know why this problem occurred but is a big problem to deal in production environments. – Marco Blos Aug 23 '18 at 03:06
  • Hi @MarcoBlos, how did you remove the network? when I try to remove it i get the error `network is in use by service...` I can only `rm` and `deploy` my stack again, and the error persists. – gcstr Dec 02 '18 at 02:14
  • 1
    Hi @gcstr... To remove the network you need to remove all services linked with the network.... Create the network again and deploy your stack again... I know, it's not cool. – Marco Blos Dec 12 '18 at 13:33
  • @MarcoBlos can you clarify how this should be done? Are these docker commands or docker-compose commands? And how do you re-create the networks and deploy? – Julien Apr 07 '19 at 00:59
  • 1
    Hi @Julien. This job is done manually, but, you can automatize this process if you want. First you need to remove all docker services related with the network (you can use this command to list all services `docker service ls` then select the services name you want to remove and execute `docker service rm my-service-name`). After that you need to remove the network using this command `docker network rm my-network-name`. After you make this proccess, create your network again using this command `docker network create --driver overlay my-network` and finally you can deploy yours services again. – Marco Blos Apr 10 '19 at 18:03
  • Oh ok, that way. Thanks I was very new to docker when I saw your comment. – Julien Apr 11 '19 at 19:04
  • This is a great example of Percussive Maintenance. Not an advisable debugging strategy. – vhs Oct 02 '19 at 13:41
  • I can confirm this still happens in traefik 2.2 ... after 1 hour doing everything, it worked after recreating the network – Dredok Apr 21 '20 at 22:31
  • In my case, I did notthing. I just waited for two or three minutes, reloaded the page in my web browser and it worked! (Traefik 2.6.1, btw) – Pathros Mar 19 '22 at 19:30
  • seems to be a valid solution – x4k3p Mar 21 '22 at 21:18
8

If you see Bad Gateway with Traefik chances are you have a Docker networking issue. First have a look at this issue and consider this solution. Then take a look at providers.docker.network (Traefik 2.0) or, in your case, the docker.network setting (Traefik 1.7).

You could add a default network here:

[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "example.com"
watch = true
exposedbydefault = false
network = "proxy"

Or define/override it for a given service using the traefik.docker.network label.

vhs
  • 9,316
  • 3
  • 66
  • 70
5

Got the same problem and none of the above mentioned answers solved it for me. In my case a wrong loadbalancer was added. Removing the label or changing it to the correct port made the trick.

 - "traefik.http.services.XXX.loadbalancer.server.port=XXX"
mhellmeier
  • 1,982
  • 1
  • 22
  • 35
4

In your example you don't have traefik enabled:

traefik.enable=false

Make sure to enable it first and then test your containers.

BMitch
  • 231,797
  • 42
  • 475
  • 450
  • 1
    you're right, I've updated my question. Only some sites are disabled. But it isn't the issue as if a site is not enabled one gets 404. Please see my answer if you're interested. It seems to be a network issue. – x4k3p Mar 21 '18 at 23:41
3

The error "bad gateway" is returned when the web server in the container doesn't allow traffic from traefik e.g. because of wrong interface binding like localhost instead of 0.0.0.0.

Take Ruby on Rails for example. Its web server puma is configured by default like this (see config/puma.rb):

port        ENV.fetch("PORT") { 3000 }

But in order to allow access from traefik puma must bind to 0.0.0.0 like so:

bind "tcp://0.0.0.0:#{ ENV.fetch("PORT") { 3000 } }"

This solved the problem for me.

thorstenhirsch
  • 199
  • 3
  • 14
1

Another cause can be exposing a container at a port that Traefik already uses.

lewislbr
  • 1,012
  • 14
  • 23
1

I forgot to expose the port in my Dockerfile thats why traefik did not find a port to route to. So expose the port BEFORE you start the application like node:

#other stuff before...

EXPOSE 3000
CMD ["node", "dist/main" ]

Or if you have multiple ports open you have to specify which port traefik should route the domain to with:

- "traefik.http.services.myservice.loadbalancer.server.port=3000"

Or see docs

kenny
  • 1,628
  • 20
  • 14
0

I faced very close issue to this exception my problem was not related to network settings or config, after time we figured out that the exposed port from the backend container is not like the port we mapping to to access form outside the service port was 5000 and we mapped 9000:9000 the solution was to fix the port issue first 9000:5000.

0

Expose port 80 for traefik

services:
  php:
    expose:
      - "80"
Jekis
  • 4,274
  • 2
  • 36
  • 42