I have a setup where I am trying to access grpc-server through gce-ingress
. My deployment yaml looks like this:-
Ingress.yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: app-backend-config
spec:
customRequestHeaders:
headers:
- "TE:trailers"
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prod
annotations:
kubernetes.io/ingress.global-static-ip-name: dummy-ingress
kubernetes.io/ingress.allow-http: "false"
cert-manager.io/issuer: issuer
cloud.google.com/backend-config: '{"default": "app-backend-config"}'
labels:
name: ingress-app
spec:
tls:
- hosts:
- domain.name
secretName: secret-tls
rules:
- host: domain.name
http:
paths:
- path: /*
pathType: ImplementationSpecific
backend:
service:
name: app-server-headless
port:
number: 8000
My app server configuration is attached below:-
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy-app-server
labels:
app: app-server
spec:
replicas: 1
selector:
matchLabels:
app: app-server
template:
metadata:
labels:
app: app-server
spec:
containers:
name: app-server
image: gcr.io/emeritus-data-science/image:latest
command: ["python3" , "/var/app/api_server/main.py"]
imagePullPolicy: Always
resources: # limit the resources
requests:
memory: 1Gi
cpu: "1"
limits:
memory: 2Gi
cpu: "1"
volumeMounts:
- mountPath: /secrets/gcloud-auth
name: gcloud-auth
readOnly: true
ports:
- containerPort: 8000
readinessProbe:
exec:
command: [ "/bin/grpc_health_probe", "-addr=:8000" ]
initialDelaySeconds: 30
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 2
livenessProbe:
exec:
command: [ "/bin/grpc_health_probe", "-addr=:8000" ]
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 10
failureThreshold: 2
volumes:
- name: gcloud-auth
secret:
secretName: gcloud
---
apiVersion: v1
kind: Service
metadata:
name: app-server-headless
annotations:
cloud.google.com/app-protocols: '{"grpc":"HTTP2"}'
spec:
type: ClusterIP
clusterIP: None
selector:
app: app-server
ports:
- protocol: TCP
port: 8000
targetPort: 8000
name: grpc
I am using python grpcio-health-checking
lib and grpc_health_probe
cmd tool to implement health check for my grpc server. My grpc-server code is attached below:-
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
master_pb2_grpc.add_EventBusOneofServiceServicer_to_server(
EventBusServiceServicer(), server
)
health_pb2_grpc.add_HealthServicer_to_server(health.HealthServicer(), server)
server.add_insecure_port("0.0.0.0:8000")
server.start()
LOG.info("server started")
def handle_sigterm(*_):
print("Received shutdown signal")
all_rpcs_done_event = server.stop(30)
all_rpcs_done_event.wait(30)
print("Shut down gracefully")
signal(SIGTERM, handle_sigterm)
server.wait_for_termination()
When I deploy this, the ingress always returns message "Some backend services are in UNHEALTHY state". All the pods are running fine without any error. however i don't see logs of probes in pods. When I execute the commands mention in probe "/bin/grpc_health_probe -addr=:8000"
by doing kubectl exec
into the pod it returns the "SERVING". However the ingress backend service always remains unhealthy. The health check of unhealthy backend service has following config :-
Description
Default kubernetes L7 Loadbalancing health check for NEG.
Path
/
Protocol
HTTP/2
Port specification
Serving port
Proxy protocol
NONE
Logs
Disabled
Interval
15 seconds
Timeout
15 seconds
Healthy threshold
1 success
Unhealthy threshold
2 consecutive failures
Is there something wrong with the readiness and liveness probe configuration or something else with the deployment.