I'm been brought in to work on an existing CMS and File Management web application that provides a merchant with a management interface for their online webshops. The management application is developed in PHP.
When the website users are viewing the webshops, the page assets (mainly images in nested folder paths) are referenced directly from the HTML of the webshops and are served directly from a web server which is separate to the CMS system.
But in order to list / search / allow navigation of the files (i.e. the File Management part) the CMS application needs to be able to access the files/folders directory structure.
So we are using Linux NFS mounts to the document file server from the CMS server. This works fairly well if the number of files in any specific merchant's directory tree is not too large (<10000). However, some merchant's have more than 100000 files in a nested directory tree. Walking this size of tree to get just the directory structure can take more than 120 seconds.
Retrieving just the list of files in any one directory is quite fast, but the problem comes when we try to identify which of these "files" are actually directory entries, so we can recurse down the tree.
It seems that the PHP functions to check the file type (either calling "is_dir" on each filepath retrieved with "readdir" or "scandir", or using "glob" with flag GLOB_ONLYDIR) work on each file individually, not in bulk. So there are now 1000s and 1000s of NFS commands being sent. From my research so far, it seems that this is a limitation of NFS, not of PHP.
A stripped down class showing just the function in question:
class clImagesDocuments {
public $dirArr;
function getDirsRecursive( $dir ) {
if ( !is_dir( $dir )) {
return false;
}
if ( !isset( $this->dirArr )) {
$this->dirArr = glob( $dir . "/*", GLOB_ONLYDIR );
} else {
$this->dirArr = array_merge( $this->dirArr, glob( $dir . "/*", GLOB_ONLYDIR ) );
return false;
}
for( $i = 0; $i < sizeof( $this->dirArr ); $i ++) {
$this->getDirsRecursive( $this->dirArr [$i] );
}
for( $i = 0; $i < sizeof( $this->dirArr ); $i ++) {
$indexArr = explode( $dir, $this->dirArr [$i] );
$tempDir[$indexArr[1]] = $this->dirArr [$i];
}
$this->dirArr = $tempDir;
}
}
Executing the same PHP code to retrieve the directory tree etc locally on the file document server is much, much faster (2 or 3 orders of magnitude), presumably because the local filesystem is caching the directory structure. I am forced to think that my problem is due to NFS.
I'm considering writing a simple webapp which will run on the file document webserver and provide realtime lookups of the directory structure via an API.
I'd appreciate any thoughts or suggestions.