32

I'm trying to set up an HTTPS load balancer for GKE using HTTPS L7 load balancer but for some reason is not working. Even the HTTP load balancer in the HTTP Load Balancing walkthrough. The forwarding rule's IP address is created and I'm able to ping and telnet to port 80. But when request via curl it give me a error.

<title>502 Server Error</title> </head> <body text=#000000 
bgcolor=#ffffff> <h1>Error: Server Error</h1> <h2>The server 
encountered a temporary error and could not complete your request. 
<p>Please try again in 30 seconds.</h2> <h2></h2> </body></html>

All the steps were fine and I created a firewall without any tags for the ${NODE_PORT} but it didn't work.

Has anyone encountered this problem?

Jesse Scherer
  • 1,492
  • 9
  • 26
lucas.coelho
  • 894
  • 1
  • 9
  • 16

8 Answers8

29

I had the same problem with my application, the problem is that we did not have an endpoint returning "Success" and the health checks were always failing.

It seems that the HTTP/HTTPS load balancer will not send the request to the cluster nodes if the health checks are not passing, so my solution was to create an endpoint that always returns 200 OK, and as soon as the health checks were passing, the LB started working.

andresk
  • 2,845
  • 1
  • 13
  • 17
  • I'm having the same issue. Does it mean I have to create an empty container in each node returning 200 on /healthz? – Paweł Szczur Jan 04 '17 at 21:32
  • I think you could just add a route in a existing container that returns 200, but if you don't want to make these changes to your existing ones, then yes – andresk Jan 05 '17 at 12:34
  • can some one explain how to do that? – Wasif Khalil May 18 '17 at 06:57
  • Any tips if I already have this endpoint and my instances are shown as healthy on the load balance page? – Nigini Dec 23 '17 at 06:14
  • Why would you do this? The health checks are there for a reason, and the LB uses them to determine whether your backend(s) can accept traffic or not. By sending a "fake" 200 always you are fooling the LB into thinking your cluster nodes are always healthy, even when they might not be. This boils down to your clients getting errors when they are unhealthy (out of resources, some other issues etc). Ideally, the health check URL is application specific and it should indicate if the node is "healthy" - which again is something that you have to determine in the context of your app. – talonx Apr 02 '20 at 12:43
11

I just walked through the example and (prior to opening up a firewall for $NODE_PORT) saw the same 502 error.

If you look in the cloud console at

https://console.developers.google.com/project/<project>/loadbalancing/http/backendServices/details/web-map-backend-service

you should see that the backend shows 0 out of ${num_nodes_in_cluster} as healthy.

For your firewall definition, make sure that you set the source filter to 130.211.0.0/22 to allow traffic from the the load balancing service and set the allowed protocols and ports to tcp:$NODE_PORT.

Robert Bailey
  • 17,866
  • 3
  • 50
  • 58
  • 2
    hmm... clearly google forgot this in the documentation on http proxies for kubernetes : https://cloud.google.com/container-engine/docs/tutorials/http-balancer – Petrov Sep 19 '15 at 14:30
  • I which i had read this before spending 1h trying to figure out what was going on. – psychok7 Dec 11 '18 at 10:12
5

I use GKE, and I just walked through the example and it works fine, but when I route to my own service, it does not work. (my service is a rest api service)

I found that the biggest difference between my service and the example, is that: the example got a root endpoint("/"), but I do not support it.

So, I solved this problem in this way: add a root endpoint("/") to my service, and just return success(an empty endpoint that returns nothing), and then re-create the ingress, and waited for several minutes, and then the ingress works!!

I think this problem should be caused by healthy checker UNHEALTHY instances do not receive new connections.

Here is a link for Healthy checks: https://cloud.google.com/compute/docs/load-balancing/health-checks

Keet Sugathadasa
  • 11,595
  • 6
  • 65
  • 80
Bruce
  • 1,718
  • 20
  • 15
  • did the same and worked well --- thanks for the tip. I know installing nginx or any other webserver would solve the problem - but that beats the point as I intentionally want to avoid any webserver to keep things lightweight. – rahul Jan 13 '19 at 16:17
2

The issue resolved after a few minutes (like 5-10 minutes) in my case.

If using an ingress, there may be additional information in the events relating to the ingress. To view these:

kubectl describe ingress example

Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
  • 1
    I recently ran into this, and it also resolved itself. Still, downtime for a few minutes here and there is not acceptable. – speedplane Sep 30 '17 at 16:59
  • There is a lot going on when you use L7 LBs. It's especially slow if you are reconfiguring a GKE ingress which is linked to a L7 LB. I've had to wait between 3 and 5 minutes for things to sort themselves out sometimes. If everything looks right, just wait a few minutes first. Trying to fix something that's not actually broken is confusing and I think that's the point of this answer. – Phil Jun 14 '19 at 10:35
1

In my case, the load balancer was returning this error because there was no web server running on my instances and instance-groups to handle the network request.

I installed nginx on all the machines and then it started working.

From now on, I made a point to add nginx in my startup script while creating the vm/instance.

Pulkit Pahwa
  • 1,412
  • 11
  • 8
1

If you are using nginx behind your loadbalancer then it's important that the default_server is returning 200 or some other 2**. That means that if you for example have a rewrite rule that returns 301 then it will fail.

The solution is to set default_server on your main server:

server {
    # Rewrite calls to www
    listen 443;
    server_name example.com;

    return 301 https://www.example.com$request_uri;
}


server {
    listen                  443 default_server;
    server_name             www.example.com;
    ...
Christoffer
  • 7,436
  • 4
  • 40
  • 42
1

Adding a Firewall Rule for Source: 130.211.0.0/22(the Load Balancer range on GCP) for the tcp:$NODEPORTIP fixed this for me.

  • Thank you for this! It appears that if you have a firewall enabled in Google Cloud, you must add the load balancer IP range to the firewall. – Matt Browne May 12 '20 at 16:50
0

I created an endpoint to all request that contain 'GoogleHC' in the user-agent.

so,

server{
    server_name example.com www.example.com

    if ($http_user_agent ~* 'GoogleHC.*') {
        return 200 'isaac newton';
    }
}
sandes
  • 1,917
  • 17
  • 28