14

What exactly are the benefits of using a PHP 5 DirectoryIterator

$dir = new DirectoryIterator(dirname(__FILE__));
foreach ($dir as $fileinfo) 
{
    // handle what has been found
}

over a PHP 4 "opendir/readdir/closedir"

if($handle = opendir(dirname(__FILE__))) 
{
    while (false !== ($file = readdir($handle))) 
    {
        // handle what has been found
    }
    closedir($handle);
}

besides the subclassing options that come with OOP?

e-sushi
  • 13,786
  • 10
  • 38
  • 57
  • 4
    that you don't have to implement recursion yourself? Your PHP 4 example iterates only over one directory, not recursively over all the children directories etc. – bwoebi Jul 21 '13 at 19:09
  • @bwoebi Speed error - I meant **DirectoryIterator**. I corrected the question and example accordingly. – e-sushi Jul 21 '13 at 19:14
  • 2
    Then the answer is still _that it is less code to write / cleaner_ – bwoebi Jul 21 '13 at 19:16
  • 1
    @bwoebi So no memory or speed benefits or anything like that? – e-sushi Jul 21 '13 at 19:18
  • 1
    I'm also interested to know if these iterators are optimized for low memory usage. For example, `readdir` can be used to list millions of files while keeping memory usage low. But if directory iterators read all the file at once like `glob`, then they are completely useless :) – Alex Dec 26 '13 at 18:37
  • @Alex: In case you run into a memory or speed problem with default `DirectoryIterator` (e.g. due to underlying file-system or tree-traversal), you can easily change that behavior without changing the rest of your code. This is *not* possible with `while(readdir)` as you mix directory traversal and file processing tightly together. You should in any case wrap `while(readdir)` into an iterator, for example [`FetchingIterator`](https://github.com/hakre/Iterator-Garden/blob/master/src/FetchingIterator.php) or if you have PHP 5.5+, wrap it into a [`Generator`](http://php.net/Generator). – hakre Dec 28 '13 at 22:55

4 Answers4

21

To understand the difference between the two, let's write two functions that read contents of a directory into an array - one using the procedural method and the other object oriented:

Procedural, using opendir/readdir/closedir

function list_directory_p($dirpath) {
    if (!is_dir($dirpath) || !is_readable($dirpath)) {
        error_log(__FUNCTION__ . ": Argument should be a path to valid, readable directory (" . var_export($dirpath, true) . " provided)");
        return null;
    }
    $paths = array();
    $dir = realpath($dirpath);
    $dh = opendir($dir);
    while (false !== ($f = readdir($dh))) {
        if ("$f" != '.' && "$f" != '..') {
            $paths[] = "$dir" . DIRECTORY_SEPARATOR . "$f";
        }
    }
    closedir($dh);
    return $paths;
}

Object Oriented, using DirectoryIterator

function list_directory_oo($dirpath) {
    if (!is_dir($dirpath) || !is_readable($dirpath)) {
        error_log(__FUNCTION__ . ": Argument should be a path to valid, readable directory (" . var_export($dirpath, true) . " provided)");
        return null;
    }
    $paths = array();
    $dir = realpath($dirpath);
    $di = new DirectoryIterator($dir);
    foreach ($di as $fileinfo) {
        if (!$fileinfo->isDot()) {
            $paths[] = $fileinfo->getRealPath();
        }
    }
    return $paths;
}

Performance

Let's assess their performance first:

$start_t = microtime(true);
for ($i = 0; $i < $num_iterations; $i++) {
    $paths = list_directory_oo(".");
}
$end_t = microtime(true);
$time_diff_micro = (($end_t - $start_t) * 1000000) / $num_iterations;
echo "Time taken per call (list_directory_oo) = " . round($time_diff_micro / 1000, 2) . "ms (" . count($paths) . " files)\n";

$start_t = microtime(true);
for ($i = 0; $i < $num_iterations; $i++) {
    $paths = list_directory_p(".");
}
$end_t = microtime(true);
$time_diff_micro = (($end_t - $start_t) * 1000000) / $num_iterations;
echo "Time taken per call (list_directory_p) = " . round($time_diff_micro / 1000, 2) . "ms (" . count($paths) . " files)\n";

On my laptop (Win 7 / NTFS), procedural method seems to be clear winner:

C:\code>"C:\Program Files (x86)\PHP\php.exe" list_directory.php
Time taken per call (list_directory_oo) = 4.46ms (161 files)
Time taken per call (list_directory_p) = 0.34ms (161 files)

On an entry-level AWS machine (CentOS):

[~]$ php list_directory.php
Time taken per call (list_directory_oo) = 0.84ms (203 files)
Time taken per call (list_directory_p) = 0.36ms (203 files)

Above are results on PHP 5.4. You'll see similar results using PHP 5.3 and 5.2. Results are similar when PHP is running on Apache or NGINX.

Code Readability

Although slower, code using DirectoryIterator is more readable.

File reading order

The order of directory contents read using either method are exact same. That is, if list_directory_oo returns array('h', 'a', 'g'), list_directory_p also returns array('h', 'a', 'g')

Extensibility

Above two functions demonstrated performance and readability. Note that, if your code needs to do further operations, code using DirectoryIterator is more extensible.

e.g. In function list_directory_oo above, the $fileinfo object provides you with a bunch of methods such as getMTime(), getOwner(), isReadable() etc (return values of most of which are cached and do not require system calls).

Therefore, depending on your use-case (that is, what you intend to do with each child element of the input directory), it's possible that code using DirectoryIterator performs as good or sometimes better than code using opendir.

You can modify the code of list_directory_oo and test it yourself.

Summary

Decision of which to use entirely depends on use-case.

If I were to write a cronjob in PHP which recursively scans a directory (and it's subdirectories) containing thousands of files and do certain operation on them, I would choose the procedural method.

But if my requirement is to write a sort of web-interface to display uploaded files (say in a CMS) and their metadata, I would choose DirectoryIterator.

You can choose based on your needs.

e-sushi
  • 13,786
  • 10
  • 38
  • 57
Manu Manjunath
  • 6,201
  • 3
  • 32
  • 31
  • 6
    The performance argument is mood. You can extend from DirectoryIterator, implement your own traversal (e.g. based on filesystem) and feel free to go with `readdir` inside. Not needing to do so to get started, is an argument for, not against it. Shipping working code to the customer fast has much better performance esp. if you can maintain it much more easily, e.g. in case the traversal really is the bottleneck, you can easily speed it up. If not, taking care about it too early is wrong. You should make visible in your answer that these performance comparisons easily hide the important things. – hakre Dec 28 '13 at 22:51
  • 1
    Additionally it makes sense to use [`FilesystemIterator`](http://php.net/FilesystemIterator) with `FilesystemIterator::SKIP_DOTS` flag (enabled by default), so that it feels more like proper iteration. – hakre Dec 28 '13 at 22:58
  • 1
    @hakre *“Shipping working code to the customer fast has much better performance…”* — say what? These are the things that make websites slow. See, the latest trend to be "quick-in-delivery" isn't always healthy. For example: to hide a DIV, people nowadays throw in jQuery while ignoring the fact that they are adding an additional 35KB download at the same time — for something that could have been done in 3 lines of javascript. Result: slow sites and moody customers. To me, "product delivery speed" is more mood than "code execution speed". Yet, I do certainly agree with you on the extendability. – e-sushi Dec 29 '13 at 10:45
  • 7
    @e-sushi: Oh here comes the jPony :) I never said you should use that for Directory access :) And DirectoryItertor is not slow. This merely depends on your file-system and is hardly the bottleneck. That is the point where this answer is misleading, those stats done are not useful. Being able to improve speed for something that actually is a bottleneck later (when you're aware that it is actually one) without re-writing large parts of the code actually is. That's where Iterators shine. Which I find a bit mis-balanced in this answer, that's all my comment wanted to address. – hakre Dec 29 '13 at 10:53
  • @hakre #ROFL Well, I could have mentioned the Boost Library instead as it has similar iterators, but looking around I noticed that PHP devs tend to know Javascript better than C++. ;) Anyway, *I actually agree with 99.9% of what you wrote.* I merely stumbled over that "customer-delivery" argument because there are people out there who'll take that as an excuse to deliver (let's just call it) “less-optimal results of speed-coding sessions”. (Btw.: just imagine you could and would give them unrestricted directory access via Javascript — now, that would surely be a hell of a security party! :}) – e-sushi Dec 29 '13 at 11:10
  • 1
    @e-sushi: Ah now I get what you mean. Yes, you're right: Quick delivery once is not what to aim for unless the contract just ends there. :D – hakre Dec 29 '13 at 11:52
  • @hakre Right, after the contract ends… hell can freeze over from my point of view. ;) – e-sushi Dec 29 '13 at 11:55
  • 1
    I've covered Extensibility/Readability, File Reading Order and performance. The right approach has to be chosen depending on the need. Let me add this to answer. – Manu Manjunath Dec 30 '13 at 06:17
  • Does it stream the files one by one, or does it load all the files all in one go? I would like to use it on something that millions of files? – CMCDragonkai May 18 '15 at 05:12
  • @CMCDragonkai If you're looking at iterating over "millions of files", is PHP the right choice? :-) – Manu Manjunath May 28 '15 at 07:36
20

Benefit 1: You can hide away all the boring details.

When using iterators you generally define them somewhere else, so real-life code would look something more like:

// ImageFinder is an abstraction over an Iterator
$images = new ImageFinder($base_directory);
foreach ($images as $image) {
    // application logic goes here.
}

The specifics of iterating through directories, sub-directories and filtering out unwanted items are all hidden from the application. That's probably not the interesting part of your application anyway, so it's nice to be able to hide those bits away somewhere else.

Benefit 2: What you do with the result is separated from obtaining the result.

In the above example, you could swap out that specific iterator for another iterator and you don't have to change what you do with the result at all. This makes the code a bit easier to maintain and add new features to later on.

Community
  • 1
  • 1
Levi Morrison
  • 19,116
  • 7
  • 65
  • 85
7

A DirectoryIterator provides you with items that make sense in themselves. For example, DirectoryIterator::getPathname() will return all the information that you need to access the file contents.

The information that readdir() provides to you only make sense locally, namely in combination with the parameter that you passed to opendir().

The DirectoryIterator is implemented in terms of wrappers around the php_stream_* functions, so no fundamentally different performance characteristics are to be expected. Particularly, items from the directory are read only when they are requested. Details can be found in the file

ext/spl/spl_directory.c

of the PHP source code.

Oswald
  • 31,254
  • 3
  • 43
  • 68
3

It's shorter, cleaner and easier to type and read.

Try re-read your examples. Just “for each in $dir in first example.

What you want, that you write…

e-sushi
  • 13,786
  • 10
  • 38
  • 57
RiaD
  • 46,822
  • 11
  • 79
  • 123