60

At the end of last week I noticed a problem on one of my medium AWS instances where Nginx always returns a HTTP 499 response if a request takes more than 60 seconds. The page being requested is a PHP script

I've spent several days trying to find answers and have tried everything that I can find on the internet including several entries here on Stack Overflow, nothing works.

I've tried modifying the PHP settings, PHP-FPM settings and Nginx settings. You can see a question I raised on the NginX forums on Friday (http://forum.nginx.org/read.php?9,237692) though that has received no response so I am hoping that I might be able to find an answer here before I am forced to moved back to Apache which I know just works.

This is not the same problem as the HTTP 500 errors reported in other entries.

I've been able to replicate the problem with a fresh micro AWS instance of NginX using PHP 5.4.11.

To help anyone who wishes to see the problem in action I'm going to take you through the set-up I ran for the latest Micro test server.

You'll need to launch a new AWS Micro instance (so it's free) using the AMI ami-c1aaabb5

This PasteBin entry has the complete set-up to run to mirror my test environment. You'll just need to change example.com within the NginX config at the end

http://pastebin.com/WQX4AqEU

Once that's set-up you just need to create the sample PHP file which I am testing with which is

<?php
sleep(70);
die( 'Hello World' );
?>

Save that into the webroot and then test. If you run the script from the command line using php or php-cgi, it will work. If you access the script via a webpage and tail the access log /var/log/nginx/example.access.log, you will notice that you receive the HTTP 1.1 499 response after 60 seconds.

Now that you can see the timeout, I'll go through some of the config changes I've made to both PHP and NginX to try to get around this. For PHP I'll create several config files so that they can be easily disabled

Update the PHP FPM Config to include external config files

sudo echo '
include=/usr/local/php/php-fpm.d/*.conf
' >> /usr/local/php/etc/php-fpm.conf

Create a new PHP-FPM config to override the request timeout

sudo echo '[www]
request_terminate_timeout = 120s
request_slowlog_timeout = 60s
slowlog = /var/log/php-fpm-slow.log ' >
/usr/local/php/php-fpm.d/timeouts.conf

Change some of the global settings to ensure the emergency restart interval is 2 minutes

# Create a global tweaks
sudo echo '[global]
error_log = /var/log/php-fpm.log
emergency_restart_threshold = 10
emergency_restart_interval = 2m
process_control_timeout = 10s
' > /usr/local/php/php-fpm.d/global-tweaks.conf

Next, we will change some of the PHP.INI settings, again using separate files

# Log PHP Errors
sudo echo '[PHP]
log_errors = on
error_log = /var/log/php.log
' > /usr/local/php/conf.d/errors.ini

sudo echo '[PHP]
post_max_size=32M
upload_max_filesize=32M
max_execution_time = 360
default_socket_timeout = 360
mysql.connect_timeout = 360
max_input_time = 360
' > /usr/local/php/conf.d/filesize.ini

As you can see, this is increasing the socket timeout to 3 minutes and will help log errors.

Finally, I'll edit some of the NginX settings to increase the timeout's that side

First I edit the file /etc/nginx/nginx.conf and add this to the http directive fastcgi_read_timeout 300;

Next, I edit the file /etc/nginx/sites-enabled/example which we created earlier (See the pastebin entry) and add the following settings into the server directive

client_max_body_size    200;
client_header_timeout   360;
client_body_timeout     360;
fastcgi_read_timeout    360;
keepalive_timeout       360;
proxy_ignore_client_abort on;
send_timeout            360;
lingering_timeout       360;

Finally I add the following into the location ~ .php$ section of the server dir

fastcgi_read_timeout 360;
fastcgi_send_timeout 360;
fastcgi_connect_timeout 1200;

Before retrying the script, start both nginx and php-fpm to ensure that the new settings have been picked up. I then try accessing the page and still receive the HTTP/1.1 499 entry within the NginX example.error.log.

So, where am I going wrong? This just works on apache when I set PHP's max execution time to 2 minutes.

I can see that the PHP settings have been picked up by running phpinfo() from a web-accessible page. I just don't get, I actually think that too much has been increased as it should just need PHP's max_execution_time, default_socket_timeout changed as well as NginX's fastcgi_read_timeout within just the server->location directive.

Update 1

Having performed some further test to show that the problem is not that the client is dying I have modified the test file to be

<?php
file_put_contents('/www/log.log', 'My first data');
sleep(70);
file_put_contents('/www/log.log','The sleep has passed');
die('Hello World after sleep');
?>

If I run the script from a web page then I can see the content of the file be set to the first string. 60 seconds later the error appears in the NginX log. 10 seconds later the contents of the file changes to the 2nd string, proving that PHP is completing the process.

Update 2

Setting fastcgi_ignore_client_abort on; does change the response from a HTTP 499 to a HTTP 200 though nothing is still returned to the end client.

Update 3

Having installed Apache and PHP (5.3.10) onto the box straight (using apt) and then increasing the execution time the problem does appear to also happen on Apache as well. The symptoms are the same as NginX now, a HTTP200 response but the actual client connection times out before hand.

I've also started to notice, in the NginX logs, that if I test using Firefox, it makes a double request (like this PHP script executes twice when longer than 60 seconds). Though that does appear to be the client requesting upon the script failing

Community
  • 1
  • 1
TFAtTheMoon
  • 1,451
  • 1
  • 10
  • 8
  • I don’t think your __updated__ conclusion is correct – your script _does_ run as long as you want it to, but the client has said _“eff this, it’s taking too long, I’ve got o more patience”_ and closed the connection. And so nginx can not send a status code any more, and that’s why it gives you this error. – CBroe Mar 25 '13 at 14:18
  • If, buy the client you are literally meaning the end web-browser, then you are mistaken as that would be me using Chrome. When I request the same script against my local Apache + PHP set-up then the script completes as expected. If by client, you mean PHP-FPM which has been passed the request by NginX, then why would it continue to execute the script? – TFAtTheMoon Mar 25 '13 at 15:06

5 Answers5

85

The cause of the problem is the Elastic Load Balancers on AWS. They, by default, timeout after 60 seconds of inactivity which is what was causing the problem.

So it wasn't NginX, PHP-FPM or PHP but the load balancer.

To fix this, simply go into the ELB "Description" tab, scroll to the bottom, and click the "(Edit)" link beside the value that says "Idle Timeout: 60 seconds"

sjagr
  • 15,983
  • 5
  • 40
  • 67
TFAtTheMoon
  • 1,451
  • 1
  • 10
  • 8
  • accept your own answer to give next readers info on this issue – gaRex Mar 26 '13 at 04:32
  • 1
    Not 100% correct, if you open a support case with Amazon they will be able to increase the TCP idle timeout of the ELB up to 15 minutes for you. If you do not have support, post to the EC2 forum and ask - these are monitored by AWS (https://forums.aws.amazon.com/forum.jspa?forumID=30) – ColtonCat Jun 19 '14 at 08:42
  • 1
    You saved me! Happily now Amazon allow you to configure Idle Timeout on ELB. – Erwin Julius Jul 26 '14 at 13:49
  • in mycase I am recieving this error from rackspace. I am executing a script for copying selected media files from one container to another container in rackspace. This was always working but from last day it is continuously giving error Exception in uploading Observation Media.Client error response [status code] 499 [reason phrase] Client Disconnect [url] https://snet-storage101.lon3.clouddrive.com/v1/MossoCloudFS_f448a09c-45b1-4ec3-bcb5-15d9e77c3bea/container-name/media.mp4.. Can you help? – Prashant Sep 02 '16 at 08:44
  • Here's more info on how to change the idle timeout on your classic ELB: https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-idle-timeout.html – mooreds Mar 03 '19 at 15:30
2

In my case - nginx was sending a request to an AWS ALB and getting a timeout with a 499 status code.

The solution was to add this line:

proxy_next_upstream off;

The default value for this in current versions of nginx is proxy_next_upstream error timeout; - which means that on a timeout it tries the next 'server' - which in the case of an ALB is the next IP in the list of resolved ips.

pbthorste
  • 309
  • 2
  • 6
1

I thought I would leave my two cents. First the problem is not related with php(still could be a php related, php always surprises me :P ). Thats for sure. its mainly caused of a server proxied to itself, more specifically hostname/aliases names issue, in your case it could be the load balancer is requesting nginx and nginx is calling back the load balancer and it keeps going that way.

I have experienced a similar issue with nginx as the load balancer and apache as the webserver/proxy

Waheed
  • 608
  • 1
  • 5
  • 20
1

Actually I faced the same issue on one server and I figured out that after nginx configuration changes I didn't restart the nginx server, so with every hit of nginx url I was getting a 499 http response. After nginx restart it started working properly with http 200 responses.

Khris
  • 3,132
  • 3
  • 34
  • 54
Rajeev kumar
  • 71
  • 1
  • 3
-1

You need to find in which place problem live. I dont' know exact answer, but just let's try to find it.

We have here 3 elements: nginx, php-fpm, php. As you told, same php settings under apache is ok. Does it's same no same setup? Did you try apache instead of nginx on same OS/host/etc.?

If we will see, that php is not suspect, then we have two suspects: nginx & php-fpm.

To exclude nginx: try to setup same "system" on ruby. See https://github.com/garex/puppet-module-nginx to get idea to install simplest ruby setup. Or use google (may be it will be even better).

My main suspect here is php-fpm.

Try to play with these settings:

  • php-fpm`s request_terminate_timeout
  • nginx`s fastcgi_ignore_client_abort
gaRex
  • 4,144
  • 25
  • 37
  • Btw, try another browser or wget or curl :) – gaRex Mar 25 '13 at 14:19
  • **request_terminate_timeout** was one of the settings I changed (see the middle of the original post) and has been set to 120 **fastcgi_ignore_client_abort** I'd not tried this before and enabling it has made a difference as now NginX is reporting a 200 response in the access log rather then the 499 which sounds like a step in the right direction :) – TFAtTheMoon Mar 25 '13 at 15:21
  • Thought the actual end browser fails (Chrome, Firefox, Lynx) which is making me think there is another timeout I've missed someone – TFAtTheMoon Mar 25 '13 at 15:21
  • *Before the change* [25/Mar/2013:15:02:27 +0000] "GET /index.php?test=4 HTTP/1.1" 499 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0" *After* [25/Mar/2013:15:18:40 +0000] "GET /index.php HTTP/1.1" 200 54 "-" "Lynx/2.8.8dev.9 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/2.12.14" – TFAtTheMoon Mar 25 '13 at 15:22
  • I've updated the main body with this, but with a fresh Apache / PHP installation (older version of PHP) that also fails. Only the additional PHP ini files I created earlier are shared between the installations which does make it sound like it's a PHP or end client issue, which is odd as I don't have the same trouble when using a WAMP set-up on a local desktop and laptop. – TFAtTheMoon Mar 25 '13 at 15:53
  • @TFAtTheMoon then the problem is in PHP settings. May be it's something stupid and obvious? Show us your phpinfo() :) – gaRex Mar 25 '13 at 16:38
  • 1
    I think I may have the cause and it's something I had not considered, Amazon's Elastic Load Balancers. It looks like they kill off connections after 60 seconds of inactive which this will be as will the "actual use" this is mocking for. If it is I will close this once I confirm that as the cause – TFAtTheMoon Mar 25 '13 at 17:13