0

I tried to look up for solutions, but I am probably not asking the correct question.

I have a service performing long running tasks deployed inside a GKE cluster (a flask app served through gunicorn). When I make a request, Postman returns a Error: socket hang up after 30 minutes. What am I not considering? Is the behaviour due to the service in front of the deployment?

My chain of services is as follows:

  • Consumer: Postman, setting for timeout=0 - i.e. infinite - and header "Connection":"keep-alive" is passed
  • loadBalancer (exposed) proxy: node+express, timeout=7200000 (ms)
  • clusterIP (internal) service: python+flask, gunicorn timeout=7200 (s)

Service is started through CMD exec gunicorn --bind 0.0.0.0:$PORT start:app --workers $NO_WORKERS --threads $NO_THREADS --timeout $TIMEOUT

The service has been deployed with the following yaml file by kubectl apply -f:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: layers-api
  labels: 
    app: layers-api
spec:
  replicas: 1
  selector:
    matchLabels:
      app: layers-api
      tier: backend
  template:
    metadata:
      labels:
        app: layers-api
        tier: backend
    spec:
      containers:
      - env:
        - name: PORT
          value: '4000'
        - name: NO_WORKERS
          value: '2'
        - name: NO_THREADS
          value: '2'
        - name: TIMEOUT
          value: '7200'
        # ...other env vars... 
        name: layers-api
        image: [DOMAIN]/[PROJECT]/[IMAGE]
        imagePullPolicy: Always
        ports:
          - containerPort: 4000
        readinessProbe:
          httpGet:
            path: /api/v1/healthz
            port: 4000
          initialDelaySeconds: 30
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /api/v1/healthz
            port: 4000
---
apiVersion: v1 
kind: Service
metadata:
  name: layers-api
  labels:
    app: layers-api
    tier: backend
spec:
  selector:
    app: layers-api
    tier: backend
  ports:
  - port: 80
    targetPort: 4000
    protocol: TCP
    name: http
  - port: 443
    targetPort: 4000
    protocol: TCP
    name: https
---

Thank you for spending your time to help.

C

EDIT: this the log from the proxy server. Moreover, I use this proxy for another routing (to a managed cloud run service - beta timeout of 3600 seconds) and this does not hang up after 30 minutes).

C. Claudio
  • 177
  • 13
  • Have you seen this [post](https://stackoverflow.com/questions/16995184/nodejs-what-does-socket-hang-up-actually-mean/27835115#27835115)? You can try to use another service and see if you are getting the same error. – Alex G Jan 28 '21 at 05:50
  • Thank you @AlexG! I indeed had a look, but I am not sure / I can't understand on how this applies to my case. The proxy server does not have the _http_client.js module (in edit). Also, admittedly I am more familiar with Python than JS, which I mostly use for proxies - such in this case. – C. Claudio Jan 28 '21 at 09:23
  • Have you tested it with a different service? – Alex G Feb 03 '21 at 09:06
  • I tried it on Cloud Run + GKE and it gives me the same problem. However, if I deploy it on Cloud Run (managed) the hang up error disappears and runs up to 1 hour [CR beta](shorturl.at/jpNTX). – C. Claudio Feb 03 '21 at 15:25
  • Try upgrading the machine type of your nodes. Are there any other services using the port 4000? – Alex G Feb 05 '21 at 09:23
  • Only this service is on 4000. The cluster is versioned 1.16.15-gke.6000. I will update it as I have a chance to do it (sure to not tear down anyone else :-D ). Thank you for your help @Alex G! – C. Claudio Feb 08 '21 at 16:45

0 Answers0