Some backend services are in UNHEALTHY state

Question

I have a setup where I am trying to access grpc-server through gce-ingress. My deployment yaml looks like this:-

Ingress.yaml

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: app-backend-config
spec:
  customRequestHeaders:
    headers:
    - "TE:trailers"
---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-prod
  annotations:
    kubernetes.io/ingress.global-static-ip-name: dummy-ingress
    kubernetes.io/ingress.allow-http: "false"
    cert-manager.io/issuer: issuer
    cloud.google.com/backend-config: '{"default": "app-backend-config"}'
  labels:
    name: ingress-app
spec:
  tls:
  - hosts:
    - domain.name
    secretName: secret-tls
  rules:
  - host: domain.name
    http:
      paths:
      - path: /*
        pathType: ImplementationSpecific
        backend:
          service:
            name: app-server-headless
            port:
              number: 8000

My app server configuration is attached below:-

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dummy-app-server

  labels:
    app: app-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: app-server
  template:
    metadata:
      labels:
        app: app-server
    spec:
      containers:
        name: app-server
        image: gcr.io/emeritus-data-science/image:latest
        command: ["python3" , "/var/app/api_server/main.py"]
        imagePullPolicy: Always
        resources: # limit the resources
          requests:
            memory: 1Gi
            cpu: "1"
          limits:
            memory: 2Gi
            cpu: "1"
        volumeMounts:
        - mountPath: /secrets/gcloud-auth
          name: gcloud-auth
          readOnly: true
        ports:
        - containerPort: 8000
        readinessProbe:
          exec:
            command: [ "/bin/grpc_health_probe", "-addr=:8000" ]
          initialDelaySeconds: 30
          timeoutSeconds: 5
          periodSeconds: 10
          failureThreshold: 2
        livenessProbe:
          exec:
            command: [ "/bin/grpc_health_probe", "-addr=:8000" ]
          initialDelaySeconds: 60
          timeoutSeconds: 5
          periodSeconds: 10
          failureThreshold: 2

      volumes:
      - name: gcloud-auth
        secret:
          secretName: gcloud
---
apiVersion: v1
kind: Service
metadata:
  name: app-server-headless
  annotations:
    cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: app-server
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000
      name: grpc

I am using python grpcio-health-checking lib and grpc_health_probe cmd tool to implement health check for my grpc server. My grpc-server code is attached below:-

    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    master_pb2_grpc.add_EventBusOneofServiceServicer_to_server(
        EventBusServiceServicer(), server
    )

    health_pb2_grpc.add_HealthServicer_to_server(health.HealthServicer(), server)

    server.add_insecure_port("0.0.0.0:8000")
    server.start()
    LOG.info("server started")

    def handle_sigterm(*_):
        print("Received shutdown signal")
        all_rpcs_done_event = server.stop(30)
        all_rpcs_done_event.wait(30)
        print("Shut down gracefully")

    signal(SIGTERM, handle_sigterm)
    server.wait_for_termination()

When I deploy this, the ingress always returns message "Some backend services are in UNHEALTHY state". All the pods are running fine without any error. however i don't see logs of probes in pods. When I execute the commands mention in probe "/bin/grpc_health_probe -addr=:8000" by doing kubectl exec into the pod it returns the "SERVING". However the ingress backend service always remains unhealthy. The health check of unhealthy backend service has following config :-

Description
Default kubernetes L7 Loadbalancing health check for NEG.
Path
/
Protocol
HTTP/2
Port specification
Serving port
Proxy protocol
NONE
Logs
Disabled
Interval
15 seconds
Timeout
15 seconds
Healthy threshold
1 success
Unhealthy threshold
2 consecutive failures

Is there something wrong with the readiness and liveness probe configuration or something else with the deployment.

Does this answer your question? [kubernetes unhealthy ingress backend](https://stackoverflow.com/questions/39294305/kubernetes-unhealthy-ingress-backend) — Mikolaj, Nov 15 '22 at 06:59
I am not sure whether the given answers apply for a grpc service as compared to a web service. I tried the solution mentioned by @cfstras there and it did not work. — ak1234, Nov 15 '22 at 09:22
@tex answered - "make sure that the readinessProbe is pointing to the same port that you expose to the Ingress." Maybe port specified in command is not enough for Ingress to mark the service as HEALTHY. Try to use `grpc` instead of `exec`. Execute `kubectl explain pod.spec.containers.readinessProbe.grpc` to get know more. — Mikolaj, Nov 15 '22 at 10:10
btw. If you want to protect slow starting containers, you can use `startupProbe` probe instead of `initialDelaySeconds` field. https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes — Mikolaj, Nov 15 '22 at 10:15
`kubectl explain pod.spec.containers.readinessProbe.grpc` returns `error: field "grpc" does not exist`. My GKE version is 1.21.14-gke.3000. I guess it was introduced in later version. `exec probes` are already using same port that is exposed by ingrees — ak1234, Nov 15 '22 at 10:49
You can try change default GKE Ingress health checking, by configuring `BackendConfig` `CRD`. https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#direct_hc — Mikolaj, Nov 16 '22 at 08:09
In the docs following is mentioned, `Note: Ingress does not support gRPC for custom health check configurations.` . That means I cannot configure a health check for grpc there. — ak1234, Nov 16 '22 at 09:57

Some backend services are in UNHEALTHY state

0 Answers0