2

I have VPS server (CentOS 6.5) running Apache 2.2.4 and PHP-FPM (FastCGI Process Manager). 2-3 times a day I get following errors in error_log:

[error] [client 127.60.158.1] (4)Interrupted system call: FastCGI: comm with server "/usr/lib/cgi-bin/php5-fcgi" aborted: select() failed
[error] [client 127.60.158.1] FastCGI: incomplete headers (0 bytes) received from server "/usr/lib/cgi-bin/php5-fcgi"
[notice] caught SIGTERM, shutting down
[alert] (4)Interrupted system call: FastCGI: read() from pipe failed (0)
[alert] (4)Interrupted system call: FastCGI: the PM is shutting down, Apache seems to have disappeared - bye

And as a result apache not always stops, sometimes only main process stops and worker processes still run which prevents me even to restart apache as it's still listening on port 80, but without main process and pid file.

I saw somebody mention to update to mod_fastcgi 2.4.7 (patched) which fixes that bug, but unfortunately RHEL/CentOS doesn't have that updates, so that is not an option for me. (Apache PHP5-FPM connection reset by peer)

Also there was thread on google answers that increasing value of --idle-timeout in fastcgi.conf can solve the issue, but I don't see the reason.

Any solutions for this problem, please?

Community
  • 1
  • 1
Arman P.
  • 4,314
  • 2
  • 29
  • 47

1 Answers1

7

Increasing -idle-timeout (just one dash in front though ^^) is indeed the solution. A complete explanation on this is given here, but I'll try to explain it:

PHP has it's own timeout, set in max_execution_time. If running it using mod_php, this setting tells PHP to quit working on a script after x seconds.

Next: the FPM process manager has another one, request_terminate_timeoutset in the pool configuration. This one limits / overrides max_execution_time.

That's it for the pure PHP-side part. If you're using PHP-FPM and FastCGI, PHP is started in its own process. The internal timeouts still apply. FastCGI however has its own timeout (which isn't needed for PHP, but how should fastCGI now PHP has its own?), that makes sure the webserver doesn't freeze if some CGI process does (or just works for a very long time).

The problem is: FastCGI just kills the IO stream between PHP and Apache, giving PHP no chance to properly shut down. Data that has already been received by FastCGI is still handed over to Apache - if it's incomplete, it'll raise an error (the one you see about incomplete headers). In addition to that, the PHP processes stay there, running in a zombie-like state, unusable because there IO stream is now dead. You may have to kill them manually or wait until PHP-FPM hopefully does (depending on process manager settings). The other errors you're getting are generated by PHP for this exact reason: Pipe was closed, Apache "disappeared" on the other end.

So: Make sure FastCGIs timeout is higher (by at least one second) than request_terminate_timeout, and this one is higher by at least another second than the highest value you use in max_execution_time.

Note that the latter can be changed using ini_set, so make sure to remember what values do work and what don't - or set it using php_admin_value in your FPM pool configuration, so it cannot be changed inside individual scripts (see the documentation). Not because it's necessary (request_terminate_timeout will terminate PHP children properly as long as it'S lower than the FastCGI timeout), but because your scripts can detect the timeout properly if setting max_execution_time fails, while they will asusme it worked if they can override it (there is no way to read request_terminate_timeout from inside PHP scripts).

You can change all values for each pool separately:

  • max_execution_time via php_value/php_admin_value inside the pool config

  • request_terminate_timeout directly inside the pool config

  • FastCGIs -idle-timeout in the Apache config, where each pool has to be added separately anyway:

    FastCgiExternalServer /usr/lib/cgi-bin/external.php5.www -socket /var/run/php5-fpm/www.sock -pass-header Authorization -idle-timeout 310 -flush

    (paths may be different of course, this is just a quote from my config, but I'd recommend the pass-header for Authorization, although not related to this problem).

Johannes H.
  • 5,875
  • 1
  • 20
  • 40
  • Thanks for your answer. I've changed all configurations and restarted both servers (httpd and php-fpm). I just had to wait for idle time on this server as it's serving a number of production websites. I hope that the problem will be gone and soon I will mark your answer as accepted. I will suggest you also to update your answer (for others reference) with `request_terminate_timeout` option set ini php-fpm config that will enforce termination of php processes when `max_execution_time` of php.ini is not enforced and can be easily changed in `htaccess` or using `ini_set`. – Arman P. Feb 07 '14 at 01:59
  • I knew there was another setting that want to play with the others, too :) I just could not remember which one it was... I'll edit it in, thanks! – Johannes H. Feb 07 '14 at 02:07
  • Strangely I still get the same behaviour even after setting everything mentioned. I still hope that it's simply due to php-fpm not restarting all processed to read new configuration or maybe httpd not restarting all workers, though I doubt. But noticed another interesting thing. Everytime I get those errors it is caused by [monit](http://mmonit.com/monit/) system management and monitoring tool. From access logs I've found out that the loopback IP 127.60.157.1 is used by monit which simply checks httpd every minute by sending "GET" request to it. – Arman P. Feb 07 '14 at 02:44
  • But I still don't get why I have mentioned errors. From all logs I clearly see that I don't hit any php-fpm process/children limits and don't have any slow scripts (slow-query log is also on and is set to less than max_execution_time) – Arman P. Feb 07 '14 at 02:46
  • Hm. as it'S clearly the Pipe dying... any crashes in PHP itself? (so that it is PHP that closes the stream, not FastCGI)? What does the php errorlog say? – Johannes H. Feb 07 '14 at 03:31
  • Hm. No. Ok. forget that comment. Missed that SIGTERM line. I'm really curious which process is sending that... I don't know how to monitor that, unfortunately. – Johannes H. Feb 07 '14 at 03:32
  • No any clues in all error logs, after another restart everything runs smooth (I hope that the problem is gone, we'll see soon). But I still don't understand how `monit` could cause the problem, because apparently simple "GET" request to the server can't take long (30 secs for sure). – Arman P. Feb 07 '14 at 03:35
  • Are you sure `monit` is only sending a simple GET to a static resource on the server? After all, it may be whatever script monit calls on the server that takes the time... what does monit request? – Johannes H. Feb 07 '14 at 03:38
  • It simply sends "GET" request to localhost on port 80 using HTTP protocol. Logically I would assume that it is querying the website "/", but the main website on host is Wordpress website and any query (request made in browser to IP or HOST) results in many access_log entries (js, css and image resources), but `monit` results only one access_log entry `"GET / HTTP/1.1" 200 877041 "-" "monit/5.5"`. I don't think that it could call the PHP script without resulting all resources to be called and have only one line entry in access_log. Though I can't find any exact documentation on it in internet. – Arman P. Feb 07 '14 at 03:55
  • Well, if you got virutal hosts running, the loopback might have another document root. Check your apache-config. – Johannes H. Feb 07 '14 at 03:59
  • Yeah it could be the case, but I don't have any defined document root for localhost so it uses default. I've also tried to `wget localhost` and got `index.html` static page generated by Wordpress cache plugin, which also contains references to (js, css and image resources) which might have been seen in access_log. In any case static html page also can't take so long to execute I think. Good note is that everything still runs smoothly. Usually it took less than 30 mins to generate the error again... – Arman P. Feb 07 '14 at 04:07
  • Is it even the same isntance of Apache that is listening to that interface? Plus: wget that IP-adderss directly. `localhost` in the host-header isn't the same as a local loopback IP-Adress. The host-.header is used as-is - if there's an IP-address in there, it matches any ServerName that is the IP-Address. – Johannes H. Feb 07 '14 at 04:08
  • Checked that also, the same static page. I've also found out in `monit` documentation that it request the main page if not specified, so yes it must be the same static page. I've also noticed that `wget` also doesn't results in resources to be queried (to be visible in access_log) as browsers do. So `monit` and `wget` basically do the same in this case. I can't imagine again that querying static page can take so long, but maybe in some instances the query results not the static page to be returned (when something changes on the page), but PHP script to be executed, which may be the issue. – Arman P. Feb 07 '14 at 04:29
  • In any case thank you very much for your expensive time spent with me debugging my problem. I will change `monit` config to query some empty page instead of main website (I need it simply check that httpd is working on port 80 and accepting HTTP protocol connection). And I think that the problem will be completely gone :) – Arman P. Feb 07 '14 at 04:31
  • OK, let's assume that `monit` was causing the problem by requesting wordpress webpage and it's php execution was taking too long. Than why `slow_log` wasn't logged or `request_terminate_timeout` and `max_execution_time` were not killing connection before hitting `-idle-timeout` limit to cause the error. (Am just curios whether these configs work or why `monit` was bypassing all that limitations by sending simple "GET" request). – Arman P. Feb 07 '14 at 04:51
  • Hm. SOrry, no ansewr to this. Especially if using `wget` to request the exact same resource does NOT trigger the error (or does it)? – Johannes H. Feb 07 '14 at 04:59
  • No it doesn't. But it seems that `monit` was also triggering error randomly, not always (as it do the same every 60 secs), but the error was triggered not immediately after restart. My guess is `monit` was requesting cached static .html file and on random occasions (when the cache had had to update) it was hitting the .php script which in it's turn could trigger that error. So `wget` could also randomly trigger error if run enough times to hit uncached version of page. But this assumptions still don't explain why connection were not killed after long execution and were not logged. – Arman P. Feb 07 '14 at 05:05
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/46984/discussion-between-arman-p-and-johannes-h) – Arman P. Feb 07 '14 at 05:06