We have a project hosted on Google App Engine in its Node.js Flexible Environment to collect data from sensors.
We receive about 10 POST /collect
requests/second that can be of very different sizes, but 99% of the times are really small requests (~100B up to ~12MB).
Looking at the collected data, we see that every once in a while (like 5-6 times a day, apparently) we miss some data.
While investingating, we put a proxy (still on App Engine), let's call it PROXY in front of our server, let's call it SERVER, in order to be able to track the full flow and see all the errors and problems we could encounter.
We noticed that, when the data is missing, PROXY has sent the data to SERVER and received back 502 Bad Gateway, and this appears in PROXY's logs (in the proxy we print when the request arrives to the proxy and when the server replies to the proxy):
07:11:15.000 SENSOR_ID response: 502 Bad Gateway
07:11:15.000 SENSOR_ID request
We then went through the SERVER's logs and discovered that, at the same timestamp, we get the following:
07:11:15.000 [error] 32#32: *84209 upstream prematurely closed connection while reading response header from upstream, client: 130.211.1.151, server: , request: "POST /collect HTTP/1.1", upstream: "http://172.17.0.1:8080/collect", host: "ourprojectid.appspot.com"
Our first assumption was that big requests, with lots of data, caused the server to fail for whatever reason, but this is not the case, and instead there is no correlation between these failure events and the size of the request.
Stack we are using: App Engine instances (run on nginx) and Nodejs Flexible Environment.
We do not have any clue where to investigate further.