I have a simple PHP script that makes some updates to a Mongo database:
$properties = db()->properties->find([], ['_id' => true]);
foreach($properties as $property) {
db()->properties->update(
['_id' => $property['_id']],
['$set' => ['random' => [lcg_value(), lcg_value()]]],
['w' => 0]
);
}
I am basically assigning a random point to each record in the "properties" collection as suggested here for querying random records. This script runs on my local system in about a minute and a half for the dataset I am operating on (about 200,000 documents).
I considered using JavaScript and running the update server-side, but running server-side code from a driver requires special permissions that are too permissive. I dislike the fact that I have to pull down all that data and send an update for each record but I couldn't figure out a way to do it in Mongo as a batch operation.
Regardless of the excessive network traffic it is fairly quick since I am only pulling down the ID and my write back to the database doesn't have the write consistency enabled.
When running this on the production machine the script suddenly takes MUCH longer. I killed the job after about 30 minutes. The production site uses a custom compiled PHP installation for various other reasons. I assume the custom compiling is the reason but I am unsure how to go about tracking down the exact reason for the slowdown.
My thoughts so far:
- Maybe some optimization module is not enabled on the production machine that is enabled on my machine? I assume something like the opcache wouldn't impact this since we are executing a CLI script once, not making a bunch of requests to the same script.
- Maybe some compile time option that makes gcc produce a more optimized binary?
- Or perhaps the Mongo module is communicating slower because of it is missing some faster networking library and has to fall back to a slower one?
The custom compiled PHP is compile with the following options and modules:
./configure --enable-static --enable-cli --disable-all --with-curl \
--enable-mongo --enable-json --enable-hash --enable-libxml \
--enable-simplexml --enable-filter --enable-dom --enable-ftp \
--enable-pcntl --enable-posix --enable-xmlwriter
If I do a diff between my local systems modules and this custom compiled version I get the following modules that are on my local system but not the production machine:
bz2
calendar
ctype
exif
fileinfo
gettext
iconv
mhash
mysql
mysqli
mysqlnd
openssl
PDO
pdo_mysql
pdo_sqlite
Phar
readline
session
shmop
sockets
sqlite3
sysvmsg
sysvsem
sysvshm
tokenizer
wddx
xdebug
xml
xmlreader
xsl
zip
zlib
None of those look like they would impact performance.
Comparing versions you can see I am running similar versions. The custom version:
PHP 5.5.13 (cli) (built: Jun 11 2014 17:00:10)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
My local machine:
PHP 5.5.12 (cli) (built: May 3 2014 07:09:53)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
with Xdebug v2.2.5, Copyright (c) 2002-2014, by Derick Rethans
I am assuming the slowness is due to IO and not CPU since the script is mostly about communicating the the Mongo database. It does calculate a random number but that doesn't factor in since my custom compiled PHP actually generates the random number much faster. Generating a million random numbers using lcg_value on my local php:
real 0m0.420s
user 0m0.411s
sys 0m0.008s
But with my custom compiled PHP:
real 0m0.081s
user 0m0.076s
sys 0m0.005s
The problem is not due to the network or CPU capabilities of the production environment. I ran the custom compiled PHP binary on my local machine and it was significantly slower there as well. So I'm pretty sure it is something to do with how it is compiled, modules used or configuration.
Anybody with any ideas of where to look next?