In general we have around 2 requests / second. However, after we pushed notification to 3000 users, we suddenly get to 120 requests / second. Unfortunately around half of those users were getting 5XX server errors, meaning half of the users who came up were getting blank pages. After the hype is gone, no server error ever happened again.
I did some research and it seems like it is because of the start up time, that is was taking too long for the instance to start up and therefore aborted. I checked my instance number, there were as many as 90 instances created, but active instances dropped from 40 to 0 after a second. This problem only occurred when there was a sudden increase of request, but I thought app engine was supposed to be able to handle this type of increase.
My question is how can I fix this problem? Or where should I keep digging to find the root of the problem. Thanks in advance!