41

Our team is working is developing WordPress plugins and provides hosted instances on a couple of independent servers. Our WordPress installation is managed by Git, all servers have the same source & WordPress setup deployed, only domains & actual data in the database varies. For each installation, MySql is running on the same host. WordPress is running exclusively on each server.

However after having deployed this setup on a Windows Server 2008 RC2, we noticed a drastic performance difference compared to our other servers: page generation time goes up from avg. 400ms to 4000-5000ms for pages generated with PHP. For static resources delivered by Apache only, speed is about the same as on linux.

So we took some steps to narrow down the problem:

  1. Make sure there is no antivir-software running or other windows domain stuff interfering
  2. Collect profiling data to identify the timekillers during script execution
  3. Test different server & hardware setups
  4. Double-check both Apache and PHP configuration for obvious configuration errors

After some profiling we quickly noticed that the evaluation of regular expressions is horribly slow on our windows machines. Evaluating 10.000 Regular expressions (preg_match) takes about 90ms on Linux and 3000ms on Windows.

Profiling, system tests and configuration details are provided bellow. We don't want to optimize this script (which we do know how to do). We want to get the script to run approximately the same speed on windows as on Linux (given the same setup regarding opcache/...). No need to optimize the memory footprint of the script too.

Update: After some time, the systems seems to run out of memory, triggering out of memory exceptions and random allocations. See bellow for more details. Restarting Apache/PHP fixed the problem for now.

Trace to _get_browser is:

File (called from)
require wp-blog-header.php (index.php:17)
wp (wp-blog-header.php:14)
WP->main (functions.php:808)
php::do_action_ref_array (class-wp.php:616)
php::call_user_func_array (wp-includes/plugin:507)
wp_slimstat::slimtrack  (php::internal (507))
wp_slimstat::_get_browser (wp-slimstat.php:385)

Update 2: Some some reason I can't remember we went back to activating PHP as an Apache Module on our servers (the same which deliver bad performance). But today they run blazingly fast (~1sec/request). Adding Opcache brings this down to ~400ms/req. Apache/PHP/Windows remained the same.

1) Profiling Results

Profiling was done with XDebug on all machines. Usually we only collected a few runs - those were enough to reveal the location where most of the time (50%+) was spent: the method [get_browser][1] of the WordPress plugin wp-slimstats:

protected static function _get_browser(){
    // Load cache
    @include_once(plugin_dir_path( __FILE__ ).'databases/browscap.php');
    // browscap.php contains $slimstat_patterns and $slimstat_browsers

    $browser = array('browser' => 'Default Browser', 'version' => '1', 'platform' => 'unknown', 'css_version' => 1, 'type' => 1);
    if (empty($slimstat_patterns) || !is_array($slimstat_patterns)) return $browser;

    $user_agent = isset($_SERVER['HTTP_USER_AGENT'])?$_SERVER['HTTP_USER_AGENT']:'';
    $search = array();
    foreach ($slimstat_patterns as $key => $pattern){
        if (preg_match($pattern . 'i', $user_agent)){
            $search = $value = $search + $slimstat_browsers[$key];
            while (array_key_exists(3, $value) && $value[3]) {
                $value = $slimstat_browsers[$value[3]];
                $search += $value;
            }
            break;
        }
    }

    // Lots of other lines to relevant to the profiling results
  }

This function similar to PHP's get_browser detects the browser's capabilities and OS. Most of the script execution time is spent in this foreach loop, evaluating all those preg_match (~approx 8000 - 10000 per page request). This takes about 90ms on Linux and 3000ms on Windows. Results were the same on all setups tested (picture shows data of two executions):

wp_slimstat::_get_browser profiling results on IIS8

Sure, loading two huge arrays takes some time. Evaluating regular expressions too. But we'd expect them to take approximately the same time on Linux and Windows. This is the profiling result on a linux vm (one page request only). The difference is pretty obvious:

enter image description here

Another time killer was actually the Object-Cache WordPress uses:

function get( $key, $group = 'default', $force = false, &$found = null ) {
    if ( empty( $group ) )
        $group = 'default';

    if ( $this->multisite && ! isset( $this->global_groups[ $group ] ) )
        $key = $this->blog_prefix . $key;

    if ( $this->_exists( $key, $group ) ) {
        $found = true;
        $this->cache_hits += 1;
        if ( is_object($this->cache[$group][$key]) )
            return clone $this->cache[$group][$key];
        else
            return $this->cache[$group][$key];
    }

    $found = false;
    $this->cache_misses += 1;
    return false;
}

Time is spent within this function itself (3 script executions):

enter image description here

On linux:

enter image description here

The last real big time killer were translations. Each translation, loaded from memory, takes anything from 0.2ms to 4ms in WordPress: enter image description here

On linux:

enter image description here

2) Tested systems

In order to make sure virtualization or Apache do affect this, we tested this on several setups. Antivir was disabled on all setups:

  • Linux Debian, Apache 2 & PHP on up to date stable releases. This is the same for developers running in their virtual machines as for staging/live servers. Acting as a reference system of desired performance. Either run in our office or at some hosting provides (shared space). Windows Systems had between 4GB and 8GB of RAM, at all time memory usage was bellow 50%. Virtualizations never run Windows & Apache at the same time.
  • Life-Servers, running at T-Systems (managed virtualized servers), on VMWare Player
    • Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module
    • Win 2008 R2. Apache 2.2.25 + PHP 5.5.1 NTS,VC11 as fastcgi module
    • Win 2008 R2. Apache 2.2.25 + PHP 5.5.1 NTS,VC11 as apache module
    • Win 2008 R2, Apache 2.2.25 + PHP 5.5.11 TS,VC11 as apache module (that's the fast one I mentioned in the update 2)
  • On a local machine, Host: OpenSuse, Virtualization: VMWare player, same as @T-Systems. To avoid their infrastructure influencing us:
    • Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module
    • Win 2008 R2. IIS7 + PHP 5.4.26 NTS,VC9 as fastcgi module (with and without wincache)
    • Win 2012. IIS * + PHP 5.5.10 NTS,VC11 as fastcgi module (with and without wincache)
  • On a local machine without virtualization
    • Win 2008 R2. Apache 2.2.25 + PHP 5.4.26 NTS,VC9 as fastcgi module

Profiling results as mentioned above were the same on the different systems (~10% derivation). Windows was always a significant factor slower then Linux.

Using a fresh install of WordPress & Slimstats resulted in approx. the same results. Rewriting the code is not an option here.

Update: Meanwhile we found two other Windows Systems (both Windows 2008 R2, VM & Phys) where this complete stack runs quite fast. Same configuration though.

Update 2: Running PHP as apache module on the Life-Servers was slightly faster then the fastcgi method: down to ~2sec, 50% less.

Running out of Memory

After some time, our Live-Server stops working at all, triggering these out of memory exceptions:

PHP Fatal error:  Out of memory (allocated 4456448) (tried to allocate 136 bytes)
PHP Fatal error:  Out of memory (allocated 8650752) (tried to allocate 45 bytes) 
PHP Fatal error:  Out of memory (allocated 6815744) (tried to allocate 24 bytes) 

This happens at random script locations. Obviously the Zend Memory Manager is not able to allocate more memory, although the scripts would be allowed to do so. At the time if incident, the server had about 50% of free RAM (2GB+). So the server does not actually run out of ram. Restarting Apache/PHP fixed this problem for now.

Not sure if this problem is related to the performance issues here. Yet as both issues seem to be memory related, its included here. Especially we'll try to reproduce the settings of the Windows-Tests that provided decent performance.

3) Apache & PHP Configuration

... probably do not have any common pitfalls. Output-Buffering is enabled (to default), multibye override disabled, ... If any option(s) are of interest we'll happily provide them.

Output of httpd.exe -V

Server version: Apache/2.4.7 (Win32)
Apache Lounge VC10 Server built:   Nov 26 2013 15:46:56
Server's Module Magic Number: 20120211:27
Server loaded:  APR 1.5.0, APR-UTIL 1.5.3
Compiled using: APR 1.5.0, APR-UTIL 1.5.3
Architecture:   32-bit
Server MPM:     WinNT
  threaded:     yes (fixed thread count)
    forked:     no
Server compiled with....
 -D APR_HAS_SENDFILE
 -D APR_HAS_MMAP
 -D APR_HAVE_IPV6 (IPv4-mapped addresses disabled)
 -D APR_HAS_OTHER_CHILD
 -D AP_HAVE_RELIABLE_PIPED_LOGS
 -D DYNAMIC_MODULE_LIMIT=256
 -D HTTPD_ROOT="/apache"
 -D SUEXEC_BIN="/apache/bin/suexec"
 -D DEFAULT_PIDLOG="logs/httpd.pid"
 -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
 -D DEFAULT_ERRORLOG="logs/error.log"
 -D AP_TYPES_CONFIG_FILE="conf/mime.types"
 -D SERVER_CONFIG_FILE="conf/httpd.conf"

mpm_winnt_module configuration:

<IfModule mpm_winnt_module>
    ThreadsPerChild 150
    ThreadStackSize 8388608 
    MaxConnectionsPerChild 0
</IfModule>

Excerpt of php.ini:

realpath_cache_size = 12M
pcre.recursion_limit = 100000

4) Current suspected reason

Old assumption:

All three examples heavily rely on big arrays and string operations. That some kind seems to be the common factory. As the implementation works ok'ish on Linux, we suspect this to be a memory problem on Windows. Given there is no database interaction at the pin-pointed locations, we don't suspect the database or Server <-> PHP integration to be the problem. Somehow PHP's memory interaction just seems to be slow. Maybe there is someone interfering with the memory on Windows making access dramatically slower?

Old assumption 2:

As the same stack runs fine on other Windows machines we assume the problem to be somewhere in the Windows configuration.

New assumption 3:

Actually I am out of assumptions. Why would run PHP that much slower as fastcgi then as apache module>

Any ideas on how to verify this or finding the real problem in here? Any help or direction for fixing this issue is highly welcome.

Community
  • 1
  • 1
Fge
  • 2,971
  • 4
  • 23
  • 37
  • Are you running both on the same hardware? – hek2mgl Apr 03 '14 at 18:01
  • @hek2mgl : both apache & mysql? yes. Given the current memory / CPU consumption and the location of time spent, this shouldn't pose a problem though. – Fge Apr 03 '14 at 18:05
  • No, I mean windows and linux – hek2mgl Apr 03 '14 at 18:06
  • @hek2mgl : no, different machines. Even for the different Windows Tests. – Fge Apr 03 '14 at 18:10
  • 1
    I mean: Are they comparable? – hek2mgl Apr 03 '14 at 18:10
  • @hek2mgl : they all have the same HDD partitioning (and applications are on the same location), amount of ram and approx the same CPU Speed. Minor differences yes, but nothing that drastic. Linux VMs are probably the most limited ones (yet best performing). – Fge Apr 03 '14 at 18:12
  • (saying: the windows testing servers all were equipped way better then any linux server we used). – Fge Apr 03 '14 at 18:14
  • What is the Windows Apache stack size? Is it the same? – Norman B. Robins0n Apr 03 '14 at 19:35
  • @NormanB.Robins0n : It was set to the default, increased it to 8MB. no changes in performance. Running out of stack should at least result in processes being killed. Processes finish always though. Added the output of `httpd -V` and the `mpm_winnt_module` configuration in the question. – Fge Apr 03 '14 at 20:46
  • Are Apache, PHP, and the fastcgi modules all 64-bit? (And I assume you mean Windows Server 2008 R2, not RC2?) – Harry Johnston Apr 08 '14 at 03:53
  • @HarryJohnston : All Software versions were 32bit. And yes, of course R2 :) – Fge Apr 08 '14 at 10:29
  • That could be a factor, I'd try 64-bit versions if available. – Harry Johnston Apr 08 '14 at 22:07
  • @HarryJohnston: We have experienced the same as described in this thread: http://stackoverflow.com/questions/21448661/apache-php-mysql-on-windows-64-bit . There wasn't any performance difference (at least not enough to solve this problem here). – Fge Apr 14 '14 at 09:16
  • I don't know if it's an option for you, but if you want to fix quickly, would PHP+IIS be worth a try? I believe PHP's performance inside IIS has been significantly improved recently, due to joint work between Zend and Microsoft. I've no idea if it would help in this case, mind you. – halfer Apr 14 '14 at 11:58
  • 1
    @halfer: IIS is a perfectly fine alternative. However we had the same bad performance on IIS too (see 2) Tested Systems). – Fge Apr 14 '14 at 16:02
  • Did you ever solve this ? We also have noticed that PHP and Wordpress are very slow on windows. – snake Sep 02 '15 at 17:18
  • @snake unfortunately no. we moved to a physical machine with excess resources and everything worked okish. It still delivers way worse performance and seeing all those resources go to waste with such a poor performance sure hurts (24 cores, 32gb ram) but page generation time is about fine now at least. Luckily we finally can move to linux within the next months :) – Fge Sep 14 '15 at 12:11
  • Oddly, the commentator at http://sljit.sourceforge.net/regex_perf.html found most regex parsers faster in MSWindows than Linux. (alhtough they did cheat and run the MSWindows benchmarks on faster hardware) – symcbean Aug 17 '16 at 21:32

5 Answers5

12

Windows has lots of services/policies that restrict, prevent, protect, control and etc usage of the computer in every situation.

A good Microsoft certified specialist will be able to solve your question within minutes, because they will have the experience to tell exactly which settings/services/policies to check and disable/enable/change settings, so that the PHP scripts are executed faster.

Out of my memory, I can only suggest you to check everything that deals with RAM, Hard Drive access, Environmental variables, Limits and Security (like Firewall). Everything that can affect the execution of php script, starting with some Remote Procedue Call policies and ending with the operating stack memory.

The logic is that is php.exe calls some external .dll file to execute some operation, there might be checks on the way done by OS, that will slow both sending request via such .dll, and receiving the response from it. If the .dll uses hard drive to access something - hard drive access policies enter into the scene. Also, how everything is situated in the memory - in RAM or hard-drive cache of RAM. Application policies. Threads policies. Limits on max percentage available for use for applications.

I am not saying that Windows-based hosts are bad, just that they are much more difficult to setup properly for a general admin. If you have Microsoft specialist on hands, he can tune your server to be as fast as Linux-based server.

Anatoliy Kim
  • 768
  • 4
  • 13
  • This is pretty much what we suspect too. Some simple reasoning leads us to the memory here: We can trace to bottleneck to PHP Script execution, not networking, Apache <-> PHP communication, or PHP <-> Mysql. All bottle-becks are only related to larger arrays in memory, there is no I/O / IPC involved at these steps. So it must be either .dll (regex) or memory policies. Out sys-admin says there are no such policies in effect though. – Fge Apr 14 '14 at 16:08
  • 1
    For a simple test - redo the script without preg_match. Use string functions like strpos() - I think you can reach similar effect. Measure the result and you will know for sure if the PCRE functions are being slow, or it is just because the arrays are large. – Anatoliy Kim Apr 15 '14 at 04:12
4
  • enable APC, when using PHP5.4

    • if you do not notice a speed gain, when APC is on, something is misconfigured

      [APC] extension=php_apc.dll apc.enabled=1 apc.shm_segments=1 apc.shm_size=128M apc.num_files_hint=7000 apc.user_entries_hint=4096 apc.ttl=7200 apc.user_ttl=7200

  • enable Zend Opcode when on PHP 5.5

    [Zend] zend_extension=ext/php_zend.dll zend_optimizerplus.enable=1 zend_optimizerplus.use_cwd=1 zend_optimizerplus.validate_timestamp=0 zend_optimizerplus.revalidate_freq=2
    zend_optimizerplus.revalidate_path=0 zend_optimizerplus.dups_fix=0 zend_optimizerplus.log_verbosity_level=1 zend_optimizerplus.memory_consumption=128 zend_optimizerplus.interned_strings_buffer=16 zend_optimizerplus.max_accelerated_files=2000 zend_optimizerplus.max_wasted_percentage=25 zend_optimizerplus.consistency_checks=0 zend_optimizerplus.force_restart_timeout=60 zend_optimizerplus.blacklist_filename= zend_optimizerplus.fast_shutdown=0 zend_optimizerplus.optimization_level=0xfffffbbf zend_optimizerplus.enable_slow_optimizations=1 opcache.memory_consumption=128 opcache.interned_strings_buffer=8 opcache.max_accelerated_files=10000 opcache.revalidate_freq=60 opcache.fast_shutdown=1 opcache.enable_cli=1

  • disable Wordpress extensions step-wise, to find the memory usage monster

  • set Wordpress: define('WP_MEMORY_LIMIT', '128M');, unless you use image converting plugins that should suffice
  • set unlimited memory in php.ini ini_set('memory_limit', -1);
  • profile without running Xdebug, this sounds crazy, but the debugger itself has a high impact
  • use memory_get_usage and spread calls all over the system to find the code position, where the memory leaks
  • give zend.enable_gc=1 a try, scripts will be slower, but use less memory
  • maybe just disable checking for the user browser in the SlimStats settings..
  • if that is not possible, try to override SlimStats getBrowser() function, with a faster getBrowser() substitute
  • for a speed comparison of user-agent fetchers, see https://github.com/quentin389/ua-speed-tests
  • https://github.com/garetjax/phpbrowscap
Jens A. Koch
  • 39,862
  • 13
  • 113
  • 141
  • The problem is not the memory usage of WordPress here. The average request only takes up to 50MB. The errors as posted above already happened at 8MB or less at random locations. Legit memory consumption of the scripts is no issue. The issue is really PHP not being able to allocate more memory for scripts although there is free memory available. – Fge Apr 04 '14 at 14:20
  • For the performance part: Enabling APC / Opcache does not result in noticable differences. Xdebug of course slows things down. Yet without xdebug (extension not enabled at all), page generation times are just ~1/2sec fast, yet way to slow (3sec+). Disabling slimstats of course speeds things up, but the other bottlenecks still stay. Its more a fighting the symptoms then the real issue, which we suspect bellow (array/memory access wise). – Fge Apr 04 '14 at 14:24
  • Sidenote: maybe it's worth to compare also with a Nginx based stack, just out of curiosity and for testing: http://wpn-xm.org/downloads/WPNXM-0.6.0-Lite-Setup-w32.exe – Jens A. Koch Apr 04 '14 at 14:51
  • Raise PHP error reporting to strict, check php_error.log. Also: Activating APC/opcode caching should result in a noticable difference! Check settings, play with apc.stat=1/0, ttl & cache. keep in mind: when turning apc on, turn xdebug off. – Jens A. Koch Apr 04 '14 at 15:06
  • Disabling XCache & Enabling APC (with your settings) does not result in any noticable difference. It does get a bit fast sure. Yet average generation time is still in the 4sec+ region. Its not the opcode generation that takes time but the examples I posted above. Those are not optimized by APC/Optimizer. Sure caching will generally improve performance. But changing the source of 3rd party plugins / WordPress Core is not an option and should not be necessary as it is proven to run decently fast and we'll do this once the application runs on both linux and windows approx the same speed. – Fge Apr 04 '14 at 15:44
  • Ì would start optimizing the examples you have posted above, to adjust for windows. For instance, getting rid of the time consuming silencing operator: `@`. But without touching the code: what is your `realpath_cache_size` and `pcre.recursion_limit`? – Jens A. Koch Apr 04 '14 at 16:11
  • `realpath_cache_size = 12M` and `pcre.recursion_limit = 100000` – Fge Apr 04 '14 at 16:15
  • As much as we'd love to refactor this code, its not this code explicitly that is slow. Putting this code in a new file, optimizing it a bit will still take approx the same time. It is those 10k regular expressions that are slow. But only on Windows. If they'd be slower by factor 2-3 we'd be ok with that. But not that much slower. But then its not only the regex snippet that runs slow. Overall, PHP just seems to run slower in some parts (as shown above) - others are pretty much as fast as on Linux. – Fge Apr 04 '14 at 16:20
  • Settings: are ok. I tried to reproduce this. Here Wordpress+SlimStat spends the majority of time on mo.php and includes. Whats the exact execution path to get a call to `_get_browser`? Also, what makes me really wonder is, that this occurs on IIS and Apache: maybe "RLimitMEM"? – Jens A. Koch Apr 04 '14 at 17:55
  • I have added the trace next to the profiling data of `_get_browser`. Happens both on IIS & Apache. – Fge Apr 04 '14 at 20:44
  • Thanks for the trace. I tested with a freshly installed WP v3.8.1 + SlimStat v3.5.8. With Xdebug Profiler: 1,8s-2,2s. http://i.imgur.com/vfzQ3mY.jpg The number of preg_match calls is 372. If it were 20000, it would be 0,2s. Without XDebug: 0,2-0,4s. Turns out the real problem with wordpress at the moment is the "pomo/localization" implementation, but thats adressed by WP Performance pack. Please update SlimStats. Good luck solving this, Regards Jens – Jens A. Koch Apr 05 '14 at 01:33
2

I took a look at that plugin on Github:

https://github.com/wp-plugins/wp-slimstat

And the offending file being included is a file that has been minified to some degree and really is data (not code), there are 5 variations each of which is about 400KB

There is also that maxmind.dat file that is 400KB, although I don't know if it uses both.

You are using an older version of the plugin, version 3.2.3 and there is a much newer one that may solve your problem.

Comparing the differences is hard because the author or whoever has not kept the git history in order, so I had to manually diff the file. Most of the changes related to _get_browser seem to be adding a cache.

It's possible loading that file is slow to parse, but I would expect PHP to be load both files at similar rates in both platforms granted that IO caching is working.

EDIT Looking a bit closer that might not solve your problem. Those files are basically large Regular Expression lookup tables. Did your Linux system have an APC cache on it and this one does not? The APC cache would probably keep the PHP file data cached (although not the compiled regex patterns)

Kristopher Ives
  • 5,838
  • 7
  • 42
  • 67
  • You are right about the plugin version (3.5.6 though). The changes however only affect the max. memory required, not the amount of regex to evaluate. https://wordpress.org/plugins/wp-slimstat/changelog/ < v. 3.5.6 describes this change. APC/Wincache was not active in any setup (unless expl. mentioned above). Bad IO wouldn't result in slow regex but slow include though :/ – Fge Apr 04 '14 at 07:51
  • Was APC enabled on Linux is more important. If APC is enabled it won't reload these strings all the time. – Kristopher Ives Apr 04 '14 at 20:57
  • No, as said, pretty much same configuration. Including optimizing extensions. – Fge Apr 04 '14 at 21:05
0

Use NGINX and FCGI for PHP via UNIX socket (not TCP socket).

http://wiki.nginx.org/PHPFcgiExample

You will notice immediately speed improvements even without accelerators. Additionally, above setup would have much lower memory usage footprint.

Mantosh Kumar
  • 5,659
  • 3
  • 24
  • 48
mikikg
  • 1,488
  • 1
  • 11
  • 23
  • 1
    A UNIX socket on Windows? Would be a Pipe. But that won't solve the problem. – Jens A. Koch Apr 14 '14 at 22:00
  • In general I recommend NGINX and FCGI, even on Windows. Connection trough socket gives some speed improvements but it is not a must. Regarding your problem, try to isolate problematic script or function and test it in CLI PHP environment (without web server) and see does it makes some changes. – mikikg Apr 14 '14 at 22:45
  • Nginx + PHP FCI faster on Linux than pre-fork apacje with mod_php? No, your response times at moderate load levels will be lower (admittedly nginx will pull ahead at high load levels until you hit the bottleneck of fcgi processes. There are several published benchmarks demonstrating this. – symcbean Aug 17 '16 at 21:28
-1

Tools that can help you to troubleshoot this issue include the sysinternals suite: http://technet.microsoft.com/en-us/sysinternals/bb842062.aspx

Which should enable you to perform deep debugging on any running process. It's quite likely that this particular issue is related to thread safety, depending on your php runtime.

See: https://learn.microsoft.com/en-us/iis/application-frameworks/running-php-applications-on-iis/best-practices-for-php-on-the-microsoft-web-platform#use-a-non-thread-safe-build-of-php

Finally, it's worth noting that the article above is entirely dedicated to php performance optimization on IIS, on Windows Server 2008 and above.

Mauro Colella
  • 446
  • 6
  • 12
  • Well, it's quite possible that I mixed up tabs and posted an answer here that wasn't in direct relation to the question, but there in the cities with the lights, people point out such issues using something called politeness. I am pretty sure other primates can learn it too. Try it. Let's make this a constructive experiment. – Mauro Colella Aug 24 '22 at 15:11