I have an app hosted on GKE which, among many tasks, serve's a zip file to clients. These zip files are constructed on the fly through many individual files on google cloud storage.
The issue that I'm facing is that when these zip's get particularly large, the connection fails randomly part way through (anywhere between 1.4GB to 2.5GB). There doesn't seem to be any pattern with timing either - it could happen between 2-8 minutes.
AFAIK, the connection is disconnecting somewhere between the load balancer and my app. Is GKE ingress (load balancer) known to close long/large connections?
GKE setup:
- HTTP(S) load balancer ingress
- NodePort backend service
- Deployment (my app)
More details/debugging steps:
- I can't reproduce it locally (without kubernetes).
- The load balancer logs
statusDetails: "backend_connection_closed_after_partial_response_sent"
while the response has a 200 status code. A google of this gave nothing helpful. - Directly accessing the pod and downloading using k8s port-forward worked successfully
- My app logs that the request was cancelled (by the requester)
- I can verify none of the files are corrupt (can download all directly from storage)