95

How does RecursiveIteratorIterator work?

The PHP manual has nothing much documented or explained. What is the difference between IteratorIterator and RecursiveIteratorIterator?

fuxia
  • 62,923
  • 6
  • 54
  • 62
varuog
  • 3,031
  • 5
  • 28
  • 56
  • 2
    there is an example at http://php.net/manual/en/recursiveiteratoriterator.construct.php and there is also an Introduction at http://php.net/manual/en/class.iteratoriterator.php - can you please point out what exactly you have trouble understanding. What should the Manual contain in order to make it easier to grasp? – Gordon Sep 02 '12 at 10:58
  • 1
    If you ask how `RecursiveIteratorIterator` works, have you understood already how `IteratorIterator` works? I mean it is basically the same, only the interface that is consumed by the two is different. And are you more interested in some examples or do you want to see the diff of the underlying C code implementation? – hakre Sep 03 '12 at 02:23
  • @Gordon i was not sure how does single foreach loop can traverse all elements in tree structure – varuog Sep 07 '12 at 17:51
  • @hakra i am now trying to study all built in interfaces as well as spl interface and iterator implementetion.I was interested to know how does it works in background with forach loop with some examples. – varuog Sep 07 '12 at 17:54
  • @hakre They are both actualy very different. `IteratorIterator` maps `Iterator`and `IteratorAggregate` into an `Iterator`, where `REcusiveIteratorIterator` is used to traverse recusivly a `RecursiveIterator` – Adam May 17 '20 at 11:44

4 Answers4

276

RecursiveIteratorIterator is a concrete Iterator implementing tree traversal. It enables a programmer to traverse a container object that implements the RecursiveIterator interface, see Iterator in Wikipedia for the general principles, types, semantics and patterns of iterators.

In difference to IteratorIterator which is a concrete Iterator implementing object traversal in linear order (and by default accepting any kind of Traversable in its constructor), the RecursiveIteratorIterator allows looping over all nodes in an ordered tree of objects and its constructor takes a RecursiveIterator.

In short: RecursiveIteratorIterator allows you to loop over a tree, IteratorIterator allows you to loop over a list. I show that with some code examples below soon.

Technically this works by breaking out of linearity by traversing all of a nodes' children (if any). This is possible because by definition all children of a node are again a RecursiveIterator. The toplevel Iterator then internally stacks the different RecursiveIterators by their depth and keeps a pointer to the current active sub Iterator for traversal.

This allows to visit all nodes of a tree.

The underlying principles are the same as with IteratorIterator: An interface specifies the type of iteration and the base iterator class is the implementation of these semantics. Compare with the examples below, for linear looping with foreach you normally do not think about the implementation details much unless you need to define a new Iterator (e.g. when some concrete type itself does not implement Traversable).

For recursive traversal - unless you do not use a pre-defined Traversal that already has recursive traversal iteration - you normally need to instantiate the existing RecursiveIteratorIterator iteration or even write a recursive traversal iteration that is a Traversable your own to have this type of traversal iteration with foreach.

Tip: You probably didn't implement the one nor the other your own, so this might be something worth to do for your practical experience of the differences they have. You find a DIY suggestion at the end of the answer.

Technical differences in short:

  • While IteratorIterator takes any Traversable for linear traversal, RecursiveIteratorIterator needs a more specific RecursiveIterator to loop over a tree.
  • Where IteratorIterator exposes its main Iterator via getInnerIerator(), RecursiveIteratorIterator provides the current active sub-Iterator only via that method.
  • While IteratorIterator is totally not aware of anything like parent or children, RecursiveIteratorIterator knows how to get and traverse children as well.
  • IteratorIterator does not need a stack of iterators, RecursiveIteratorIterator has such a stack and knows the active sub-iterator.
  • Where IteratorIterator has its order due to linearity and no choice, RecursiveIteratorIterator has a choice for further traversal and needs to decide per each node (decided via mode per RecursiveIteratorIterator).
  • RecursiveIteratorIterator has more methods than IteratorIterator.

To summarize: RecursiveIterator is a concrete type of iteration (looping over a tree) that works on its own iterators, namely RecursiveIterator. That is the same underlying principle as with IteratorIerator, but the type of iteration is different (linear order).

Ideally you can create your own set, too. The only thing necessary is that your iterator implements Traversable which is possible via Iterator or IteratorAggregate. Then you can use it with foreach. For example some kind of ternary tree traversal recursive iteration object together with the according iteration interface for the container object(s).


Let's review with some real-life examples that are not that abstract. Between interfaces, concrete iterators, container objects and iteration semantics this maybe is not a that bad idea.

Take a directory listing as an example. Consider you have got the following file and directory tree on disk:

Directory Tree

While a iterator with linear order just traverse over the toplevel folder and files (a single directory listing), the recursive iterator traverses through subfolders as well and list all folders and files (a directory listing with listings of its subdirectories):

Non-Recursive        Recursive
=============        =========

   [tree]            [tree]
    ├ dirA            ├ dirA
    └ fileA           │ ├ dirB
                      │ │ └ fileD
                      │ ├ fileB
                      │ └ fileC
                      └ fileA

You can easily compare this with IteratorIterator which does no recursion for traversing the directory tree. And the RecursiveIteratorIterator which can traverse into the tree as the Recursive listing shows.

At first a very basic example with a DirectoryIterator that implements Traversable which allows foreach to iterate over it:

$path = 'tree';
$dir  = new DirectoryIterator($path);

echo "[$path]\n";
foreach ($dir as $file) {
    echo " ├ $file\n";
}

The exemplary output for the directory structure above then is:

[tree]
 ├ .
 ├ ..
 ├ dirA
 ├ fileA

As you see this is not yet using IteratorIterator or RecursiveIteratorIterator. Instead it just just using foreach that operates on the Traversable interface.

As foreach by default only knows the type of iteration named linear order, we might want to specify the type of iteration explicitly. At first glance it might seem too verbose, but for demonstration purposes (and to make the difference with RecursiveIteratorIterator more visible later), lets specify the linear type of iteration explicitly specifying the IteratorIterator type of iteration for the directory listing:

$files = new IteratorIterator($dir);

echo "[$path]\n";
foreach ($files as $file) {
    echo " ├ $file\n";
}

This example is nearly identical with the first one, the difference is that $files is now an IteratorIterator type of iteration for Traversable $dir:

$files = new IteratorIterator($dir);

As usual the act of iteration is performed by the foreach:

foreach ($files as $file) {

The output is exactly the same. So what is different? Different is the object used within the foreach. In the first example it is a DirectoryIterator in the second example it is the IteratorIterator. This shows the flexibility iterators have: You can replace them with each other, the code inside foreach just continue to work as expected.

Lets start to get the whole listing, including subdirectories.

As we now have specified the type of iteration, let's consider to change it to another type of iteration.

We know we need to traverse the whole tree now, not only the first level. To have that work with a simple foreach we need a different type of iterator: RecursiveIteratorIterator. And that one can only iterate over container objects that have the RecursiveIterator interface.

The interface is a contract. Any class implementing it can be used together with the RecursiveIteratorIterator. An example of such a class is the RecursiveDirectoryIterator, which is something like the recursive variant of DirectoryIterator.

Lets see a first code example before writing any other sentence with the I-word:

$dir  = new RecursiveDirectoryIterator($path);

echo "[$path]\n";
foreach ($dir as $file) {
    echo " ├ $file\n";
}

This third example is nearly identical with the first one, however it creates some different output:

[tree]
 ├ tree\.
 ├ tree\..
 ├ tree\dirA
 ├ tree\fileA

Okay, not that different, the filename now contains the pathname in front, but the rest looks similar as well.

As the example shows, even the directory object already imlements the RecursiveIterator interface, this is not yet enough to make foreach traverse the whole directory tree. This is where the RecursiveIteratorIterator comes into action. Example 4 shows how:

$files = new RecursiveIteratorIterator($dir);

echo "[$path]\n";
foreach ($files as $file) {
    echo " ├ $file\n";
}

Using the RecursiveIteratorIterator instead of just the previous $dir object will make foreach to traverse over all files and directories in a recursive manner. This then lists all files, as the type of object iteration has been specified now:

[tree]
 ├ tree\.
 ├ tree\..
 ├ tree\dirA\.
 ├ tree\dirA\..
 ├ tree\dirA\dirB\.
 ├ tree\dirA\dirB\..
 ├ tree\dirA\dirB\fileD
 ├ tree\dirA\fileB
 ├ tree\dirA\fileC
 ├ tree\fileA

This should already demonstrate the difference between flat and tree traversal. The RecursiveIteratorIterator is able to traverse any tree-like structure as a list of elements. Because there is more information (like the level the iteration takes currently place), it is possible to access the iterator object while iterating over it and for example indent the output:

echo "[$path]\n";
foreach ($files as $file) {
    $indent = str_repeat('   ', $files->getDepth());
    echo $indent, " ├ $file\n";
}

And output of Example 5:

[tree]
 ├ tree\.
 ├ tree\..
    ├ tree\dirA\.
    ├ tree\dirA\..
       ├ tree\dirA\dirB\.
       ├ tree\dirA\dirB\..
       ├ tree\dirA\dirB\fileD
    ├ tree\dirA\fileB
    ├ tree\dirA\fileC
 ├ tree\fileA

Sure this does not win a beauty contest, but it shows that with the recursive iterator there is more information available than just the linear order of key and value. Even foreach can only express this kind of linearity, accessing the iterator itself allows to obtain more information.

Similar to the meta-information there are also different ways possible how to traverse the tree and therefore order the output. This is the Mode of the RecursiveIteratorIterator and it can be set with the constructor.

The next example will tell the RecursiveDirectoryIterator to remove the dot entries (. and ..) as we do not need them. But also the recursion mode will be changed to take the parent element (the subdirectory) first (SELF_FIRST) before the children (the files and sub-subdirs in the subdirectory):

$dir  = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
$files = new RecursiveIteratorIterator($dir, RecursiveIteratorIterator::SELF_FIRST);

echo "[$path]\n";
foreach ($files as $file) {
    $indent = str_repeat('   ', $files->getDepth());
    echo $indent, " ├ $file\n";
}

The output now shows the subdirectory entries properly listed, if you compare with the previous output those were not there:

[tree]
 ├ tree\dirA
    ├ tree\dirA\dirB
       ├ tree\dirA\dirB\fileD
    ├ tree\dirA\fileB
    ├ tree\dirA\fileC
 ├ tree\fileA

The recursion mode therefore controls what and when a brach or leaf in the tree is returned, for the directory example:

  • LEAVES_ONLY (default): Only list files, no directories.
  • SELF_FIRST (above): List directory and then the files in there.
  • CHILD_FIRST (w/o example): List files in subdirectory first, then the directory.

Output of Example 5 with the two other modes:

  LEAVES_ONLY                           CHILD_FIRST

  [tree]                                [tree]
         ├ tree\dirA\dirB\fileD                ├ tree\dirA\dirB\fileD
      ├ tree\dirA\fileB                     ├ tree\dirA\dirB
      ├ tree\dirA\fileC                     ├ tree\dirA\fileB
   ├ tree\fileA                             ├ tree\dirA\fileC
                                        ├ tree\dirA
                                        ├ tree\fileA

When you compare that with standard traversal, all these things are not available. Recursive iteration therefore is a little bit more complex when you need to wrap your head around it, however it is easy to use because it behaves just like an iterator, you put it into a foreach and done.

I think these are enough examples for one answer. You can find the full source-code as well as an example to display nice-looking ascii-trees in this gist: https://gist.github.com/3599532

Do It Yourself: Make the RecursiveTreeIterator Work Line by Line.

Example 5 demonstrated that there is meta-information about the iterator's state available. However, this was purposefully demonstrated within the foreach iteration. In real life this naturally belongs inside the RecursiveIterator.

A better example is the RecursiveTreeIterator, it takes care of indenting, prefixing and so on. See the following code fragment:

$dir   = new RecursiveDirectoryIterator($path, RecursiveDirectoryIterator::SKIP_DOTS);
$lines = new RecursiveTreeIterator($dir);
$unicodeTreePrefix($lines);
echo "[$path]\n", implode("\n", iterator_to_array($lines));

The RecursiveTreeIterator is intended to work line by line, the output is pretty straight forward with one little problem:

[tree]
 ├ tree\dirA
 │ ├ tree\dirA\dirB
 │ │ └ tree\dirA\dirB\fileD
 │ ├ tree\dirA\fileB
 │ └ tree\dirA\fileC
 └ tree\fileA

When used in combination with a RecursiveDirectoryIterator it displays the whole pathname and not just the filename. The rest looks good. This is because the file-names are generated by SplFileInfo. Those should be displayed as the basename instead. The desired output is the following:

/// Solved ///

[tree]
 ├ dirA
 │ ├ dirB
 │ │ └ fileD
 │ ├ fileB
 │ └ fileC
 └ fileA

Create a decorator class that can be used with RecursiveTreeIterator instead of the RecursiveDirectoryIterator. It should provide the basename of the current SplFileInfo instead of the pathname. The final code fragment could then look like:

$lines = new RecursiveTreeIterator(
    new DiyRecursiveDecorator($dir)
);
$unicodeTreePrefix($lines);
echo "[$path]\n", implode("\n", iterator_to_array($lines));

These fragments including $unicodeTreePrefix are part of the gist in Appendix: Do It Yourself: Make the RecursiveTreeIterator Work Line by Line..

hakre
  • 193,403
  • 52
  • 435
  • 836
  • 1
    It doesn't answer the questions posed, contains factual errors and misses core points when you move on to customising the iteration. All in all, it looks like a poor attempt to get the bounty on a topic that you don't know much about or, if you do, cannot distill into an answer to the questions posed. – salathe Sep 02 '12 at 14:56
  • 2
    Well which does not answer my "why" question, you just make more words without saying much. Maybe you start with actually an Error that counts? Point to it, don't keep it your secret. – hakre Sep 02 '12 at 14:59
  • The very first sentence is incorrect: "RecursiveIteratorIterator is an IteratorIterator that supports…", this is not true. – salathe Sep 02 '12 at 15:23
  • You also forget to mention the **entire purpose** of the `IteratorIterator`, its reason for being. – salathe Sep 02 '12 at 15:24
  • And you forget to mention **how `RecursiveIteratorIterator` works** (it knows how to call `getChildren()`, for example). – salathe Sep 02 '12 at 15:26
  • Your entire answer is showing examples of "standard" traversal, versus "recursive" traversal, nothing more. – salathe Sep 02 '12 at 15:27
  • You forget to mention, when talking about `SELF_FIRST`, the other modes available and especially the name of the default mode (re. "core points" mentioned earlier). – salathe Sep 02 '12 at 15:29
  • 1
    @salathe: Tank you for your feedback. I edited the answer to adress it. The first sentence indeed was wrong and incomplete. I have still left out the concrete implementation details of `RecursiveIteratorIterator` because this is in common with other types but I did give some technical infos of how it actually works. The examples I think show well the differences: the type of iteration *is* the main difference between the two. No idea if you buy the type of iteration you coin it a little differently but IMHO not easy with the semantics types of iteration have. – hakre Sep 02 '12 at 22:35
  • 1
    The first part is improved somewhat, but once you start moving onto the examples it still veers into factual inaccuracies. If you cropped the answer at the horizontal rule, it would be much improved. – salathe Sep 03 '12 at 06:48
  • @salathe: yes it's not perfectly formulated but I didn't want to drop the examples. – hakre Oct 03 '12 at 11:33
34

What is the difference of IteratorIterator and RecursiveIteratorIterator?

To understand the difference between these two iterators, one must first understand a little bit about the naming conventions used and what we mean by "recursive" iterators.

Recursive and non-recursive iterators

PHP has non-"recursive" iterators, such as ArrayIterator and FilesystemIterator. There are also "recursive" iterators such as the RecursiveArrayIterator and RecursiveDirectoryIterator. The latter have methods enabling them to be drilled down into, the former do not.

When instances of these iterators are looped over on their own, even the recursive ones, the values only come from the "top" level even if looping over a nested array or directory with sub-directories.

The recursive iterators implement recursive behaviour (via hasChildren(), getChildren()) but do not exploit it.

It might be better to think of the recursive iterators as "recursible" iterators, they have the ability to be iterated recursively but simply iterating over an instance of one of these classes will not do that. To exploit the recursive behaviour, keep reading.

RecursiveIteratorIterator

This is where the RecursiveIteratorIterator comes in to play. It has the knowledge of how to call the "recursible" iterators in such a way as to drill down into the structure in a normal, flat, loop. It puts the recursive behaviour into action. It essentially does the work of stepping over each of the values in the iterator, looking to see if there are "children" to recurse into or not, and stepping into and out of those collections of children. You stick an instance of RecursiveIteratorIterator into a foreach, and it dives into the structure so that you don't have to.

If the RecursiveIteratorIterator was not used, you would have to write your own recursive loops to exploit the recursive behaviour, checking against the "recursible" iterator's hasChildren() and using getChildren().

So that's a brief overview of RecursiveIteratorIterator, how is it different from IteratorIterator? Well, you're basically asking the same sort of question as What is the difference between a kitten and a tree? Just because both appear in the same encyclopaedia (or manual, for the iterators) doesn't mean you should get confused between the two.

IteratorIterator

The job of the IteratorIterator is to take any Traversable object, and wrap it such that it satisfies the Iterator interface. A use for this is to then be able to apply iterator-specific behaviour on the non-iterator object.

To give a practical example, the DatePeriod class is Traversable but not an Iterator. As such, we can loop over its values with foreach() but cannot do other things that we ordinarily would with an iterator, such as filtering.

TASK: Loop over the Mondays, Wednesdays and Fridays of the next four weeks.

Yes, this is trivial by foreach-ing over the DatePeriod and using an if() within the loop; but that's not the point of this example!

$period = new DatePeriod(new DateTime, new DateInterval('P1D'), 28);
$dates  = new CallbackFilterIterator($period, function ($date) {
    return in_array($date->format('l'), array('Monday', 'Wednesday', 'Friday'));
});
foreach ($dates as $date) { … }

The above snippet won't work because the CallbackFilterIterator expects an instance of a class that implements the Iterator interface, which DatePeriod does not. However, since it is Traversable we can easily satisfy that requirement by using IteratorIterator.

$period = new IteratorIterator(new DatePeriod(…));

As you can see, this has nothing whatsoever to do with iterating over iterator classes nor recursion, and therein lies the difference between IteratorIterator and RecursiveIteratorIterator.

Summary

RecursiveIteraratorIterator is for iterating over a RecursiveIterator ("recursible" iterator), exploiting the recursive behaviour that is available.

IteratorIterator is for applying Iterator behaviour to non-iterator, Traversable objects.

salathe
  • 51,324
  • 12
  • 104
  • 132
  • Isn't `IteratorIterator` just the standard type of linear order traversal for `Traversable` objects? Those which could be used without it just in `foreach` as is? And even further, isn't a `RecursiveIterator` *always* a `Traversable` and therefore not only `IteratorIterator` but also `RecursiveIteratorIterator` always *"for applying `Iterator` behaviour to non-iterator, Traversable objects"*? (I'd now say `foreach` applies the type of iteration via the iterator object on container objects that implement an iterator-type interface so these are iterator-container-objects, always `Traversable`) – hakre Sep 02 '12 at 22:54
  • As my answer states, `IteratorIterator` is a class that is all about wrapping `Traversable` objects in an `Iterator`. *Nothing more*. You seem to be applying the term more generally. – salathe Sep 03 '12 at 06:34
  • Seemingly informative answer. One question, wouldn't RecursiveIteratorIterator also wrap objects so that they would also have access to Iterator behavior? The only difference between the two would be that the RecursiveIteratorIterator can drill down, whereas the IteratorIterator cannot? – Mike Purcell Sep 05 '12 at 21:01
  • @salathe, do you know why recursive iterator (RecursiveDirectoryIterator) does not implements hasChildren(), getChildren() behaviour? – anru Oct 08 '12 at 03:33
  • @anru, what makes you think it doesn't implement those methods? It does. – goat Oct 14 '12 at 20:53
  • 8
    +1 for saying "recursible". The name lead me astray for a long time because the `Recursive` in `RecursiveIterator` implies behavior, whereas a more suitable name would have been one which describes capability, like `RecursibleIterator`. – goat Oct 14 '12 at 21:18
  • It looks to me like RecursiveIterators always needs to be wrapped by `RecursiveIteratorIterator`. If thats the case, then I think it would be an improvement for the PHP language to get rid of the public `RecursiveIteratorIterator` class and instead `new RecursiveArrayIterator` should return an iterator that can be directly used in a foreach loop and iterates recursively. Or is there any use-case where you want to use a RecursiveIterators without `RecursiveIteratorIterator`? – Adam May 17 '20 at 11:55
  • @Adam: We can't get rid of the public `RecursiveIteratorIterator` class as it is the de-facto implementation of the recursive traversal that can be contracted with the `RecursiveIterator` interface. Perhaps a "you can't have the cookie and it eat, too" situation. What you question however, has a basis I'd say. My suggestion: When you already know that your type has (only this kind of) traversal, make it an iterator aggregate ([`implements IteratorAggregate`](https://www.php.net/manual/en/class.iteratoraggregate.php)) and give that iterator directly from the `getIterator()` method. – hakre Dec 02 '22 at 08:25
0

When used with iterator_to_array(), RecursiveIteratorIterator will recursively walk the array to find all the values. Meaning that it will flatten the original array.

IteratorIterator will keep the original hierarchical structure.

This example will show you clearly the difference:

$array = array(
               'ford',
               'model' => 'F150',
               'color' => 'blue', 
               'options' => array('radio' => 'satellite')
               );

$recursiveIterator = new RecursiveIteratorIterator(new RecursiveArrayIterator($array));
var_dump(iterator_to_array($recursiveIterator, true));

$iterator = new IteratorIterator(new ArrayIterator($array));
var_dump(iterator_to_array($iterator,true));
Tchoupi
  • 14,560
  • 5
  • 37
  • 71
  • This is totally misleading. `new IteratorIterator(new ArrayIterator($array))` is equivalent to `new ArrayIterator($array)`, that is, the outer `IteratorIterator` is doing nothing. Moreover, the flattening of the output has nothing to do with `iterator_to_array` – it merely converts the iterator to an array. The flattening is a property of the way `RecursiveArrayIterator` walks its inner iterator. – Quolonel Questions Aug 31 '14 at 16:39
0

RecursiveDirectoryIterator it displays the whole pathname and not just the filename. The rest looks good. This is because the file-names are generated by SplFileInfo. Those should be displayed as the basename instead. The desired output is the following:

$path =__DIR__;
$dir = new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS);
$files = new RecursiveIteratorIterator($dir,RecursiveIteratorIterator::SELF_FIRST);
while ($files->valid()) {
    $file = $files->current();
    $filename = $file->getFilename();
    $deep = $files->getDepth();
    $indent = str_repeat('│ ', $deep);
    $files->next();
    $valid = $files->valid();
    if ($valid and ($files->getDepth() - 1 == $deep or $files->getDepth() == $deep)) {
        echo $indent, "├ $filename\n";
    } else {
        echo $indent, "└ $filename\n";
    }
}

output:

tree
 ├ dirA
 │ ├ dirB
 │ │ └ fileD
 │ ├ fileB
 │ └ fileC
 └ fileA
javad shariaty
  • 939
  • 10
  • 14