4

We have a project hosted on Google App Engine in its Node.js Flexible Environment to collect data from sensors.

We receive about 10 POST /collect requests/second that can be of very different sizes, but 99% of the times are really small requests (~100B up to ~12MB).

Looking at the collected data, we see that every once in a while (like 5-6 times a day, apparently) we miss some data.

While investingating, we put a proxy (still on App Engine), let's call it PROXY in front of our server, let's call it SERVER, in order to be able to track the full flow and see all the errors and problems we could encounter.

We noticed that, when the data is missing, PROXY has sent the data to SERVER and received back 502 Bad Gateway, and this appears in PROXY's logs (in the proxy we print when the request arrives to the proxy and when the server replies to the proxy):

07:11:15.000 SENSOR_ID response: 502 Bad Gateway
07:11:15.000 SENSOR_ID request

We then went through the SERVER's logs and discovered that, at the same timestamp, we get the following:

07:11:15.000 [error] 32#32: *84209 upstream prematurely closed connection while reading response header from upstream, client: 130.211.1.151, server: , request: "POST /collect HTTP/1.1", upstream: "http://172.17.0.1:8080/collect", host: "ourprojectid.appspot.com"

Our first assumption was that big requests, with lots of data, caused the server to fail for whatever reason, but this is not the case, and instead there is no correlation between these failure events and the size of the request.

Stack we are using: App Engine instances (run on nginx) and Nodejs Flexible Environment.

We do not have any clue where to investigate further.

KGo
  • 18,536
  • 11
  • 31
  • 47
smellyarmpits
  • 1,080
  • 3
  • 13
  • 32
  • Your question could have answer here https://stackoverflow.com/questions/38012797/google-app-engine-502-bad-gateway-with-nodejs – Pig and Cat Sep 20 '17 at 09:21
  • Hi, thank you for your reply. My problem is different though. I can see the errors in the Google App Engine logs. Plus, those steps talk about Compute Engine Instances, but I don't see any compute engine instance in my project. I only see instances in App Engine. Do you have any other hints? – smellyarmpits Sep 20 '17 at 10:16
  • @MarcoGalassi - You can SSH into your GAE Flexible VM from the Cloud Console. App Engine -> Instances tab -> SSH next to the VM. – KGo Sep 20 '17 at 17:36
  • Ok, I did it and I was successfully able to read logs from the VM instance by SSHing. But, the logs I see there just resemble the logs I see in Google Stackdriver filtering by nginx.errors – smellyarmpits Sep 21 '17 at 09:55
  • App Engine flexible has a [32MB limit on the size of requests](https://cloud.google.com/appengine/docs/flexible/nodejs/how-requests-are-handled#request_limits). That could very well be what you are running into. Have you tested repeatedly sending requests of a certain size to see if they start failing past a certain threshold? – Yannick MG Oct 06 '17 at 14:29
  • @YannickMG thank you for your response, but for the errors usually correspond to packets that are just a few Bytes. Plus we never saw anything like 32MB requests in the past. The biggest we haveever seen is like 12MB – smellyarmpits Oct 10 '17 at 07:12
  • My bad, misread your post. For the benefit of others reading your question I believe [this post](https://groups.google.com/d/msg/google-appengine/6gvlur9tXW0/1QIxKUXtAwAJ) on Google Groups has the right answer. – Yannick MG Oct 11 '17 at 18:54

1 Answers1

0

This is likely because there is a race condition between nginx reusing the connection and your app closing the connection, because your app has a similar or smaller keepAliveTimeout than that configured by Google in their nginx server.

You can fix this by setting your server.keepAliveTimeout to 700 seconds (or at least 650 seconds, plus a good buffer for network latency). For example:

const server = http.createServer({ keepAliveTimeout: 700_000 }, app)
server.listen(port, () => console.log('Server listening'));

There's more detailed analysis as to why this happens in https://stackoverflow.com/a/76044099.

domdomegg
  • 1,498
  • 11
  • 20