7

Possible Duplicate:
PHP SPL RecursiveDirectoryIterator RecursiveIteratorIterator retrieving the full tree

I am not sure where to start. But I have to get the paths to all files in a folder and all the content of the subfolder in paths too. For example if I had 1 folder that had five folders and each folder had 10 mp3s in it etc... That means my array would have to find 50 paths to these files.

Later lets say I added one more folder and it had 3 folders in it and each folder had 10 images.

My code would now need to find 80 paths and store them in an array.

Does my question make sense?

UPDATE:

My desired out put would be to have all these paths stored in one array.

But I would "LOVE" the code to be dynamic, meaning if I later add 10 more folder and each having 17 subfolder and each folder having a multitude of different content. I would like the array to hold the file paths of all the files. I hppe this makes sense.

Community
  • 1
  • 1
Papa De Beau
  • 3,744
  • 18
  • 79
  • 137
  • 1
    I understand the folder structure you have. Now what do you want your output to be. Update your **desired output** in your question! :) – Praveen Kumar Purushothaman Sep 02 '12 at 06:17
  • why would you want to do that. I have little experience with php but in my opinion that would kill your poor server. Imagine 5000 people doing the reads of your directory structure!!! – Tanmay Sep 02 '12 at 06:19
  • Well why I want to do that is because I have a script in flash as3 that can download one file at a time. Flash can not download a folder and its contents so I would like the php to create a string of all the contents of the folders/subfolders and send that back to flash and it can begin downloading the content in the app. :) – Papa De Beau Sep 02 '12 at 06:23
  • [There](http://stackoverflow.com/questions/3826963) [are](http://stackoverflow.com/questions/2014474) [lots](http://stackoverflow.com/questions/4204728) [of](http://stackoverflow.com/questions/8799774) [existing](http://stackoverflow.com/questions/4930865) [questions](http://stackoverflow.com/questions/2398147) [on](http://stackoverflow.com/questions/2059891) [this](http://stackoverflow.com/questions/2542100) [topic](http://stackoverflow.com/questions/8813643). – salathe Sep 02 '12 at 10:38
  • We might should create a canonical question for it, right. – hakre Sep 02 '12 at 10:49
  • `isFile()) {` `echo $entry, '
    ';` `}` `}?>`
    – T.Todua Mar 28 '13 at 12:09

3 Answers3

26

What you are looking for is also called recursive directory traversing. Which means, you're going through all directories and list subdirectories and files in there. If there is a subdirectory it is traversed as well and so on and so forth - so it is recursive.

As you can imagine this is somewhat a common thing you need when you write a software and PHP supports you with that. It offers one RecursiveDirectoryIterator so that directories can be recursively iterated and the standard RecursiveIteratorIterator to do the traversal. You can then easily access all files and directories with a simple iteration, for example via foreach:

$rootpath = '.';
$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator($rootpath)
);
foreach($fileinfos as $pathname => $fileinfo) {
    if (!$fileinfo->isFile()) continue;
    var_dump($pathname);
}

This example first of all specifies the directory you want to traverse. I've been taking the current one:

$rootpath = '.';

The next line of code is a little bit long, it does instantiate the directory iterator and then the iterator-iterator so that the tree-like structure can be traversed in a single/flat loop:

$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator($rootpath)
);

These $fileinfos are then iterated with a simple foreach:

foreach($fileinfos as $pathname => $fileinfo) {

Inside of it, there is a test to skip all directories from being output. This is done by using the SplFileInfo object that is iterated over. It is provided by the recursive directory iterator and contains a lot of helpful properties and methods when working with files. You can as well for example return the file extension, the basename information about size and time and so on and so forth.

if (!$fileinfo->isFile()) continue;

Finally I just output the pathname that is the full path to the file:

var_dump($pathname);

An exemplary output would look like this (here on a windows operating system):

string(12) ".\.buildpath"
string(11) ".\.htaccess"
string(33) ".\dom\xml-attacks\attacks-xml.php"
string(38) ".\dom\xml-attacks\billion-laughs-2.xml"
string(36) ".\dom\xml-attacks\billion-laughs.xml"
string(40) ".\dom\xml-attacks\quadratic-blowup-2.xml"
string(40) ".\dom\xml-attacks\quadratic-blowup-3.xml"
string(38) ".\dom\xml-attacks\quadratic-blowup.xml"
string(22) ".\dom\xmltree-dump.php"
string(25) ".\dom\xpath-list-tags.php"
string(22) ".\dom\xpath-search.php"
string(27) ".\dom\xpath-text-search.php"
string(29) ".\encrypt-decrypt\decrypt.php"
string(29) ".\encrypt-decrypt\encrypt.php"
string(26) ".\encrypt-decrypt\test.php"
string(13) ".\favicon.ico"

If there is a subdirectory that is not accessible, the following would throw an exception. This behaviour can be controlled with some flags when instantiating the RecursiveIteratorIterator:

$fileinfos = new RecursiveIteratorIterator(
    new RecursiveDirectoryIterator('.'),
    RecursiveIteratorIterator::LEAVES_ONLY,
    RecursiveIteratorIterator::CATCH_GET_CHILD
);

I hope this was informative. You can also Wrap this up into a class of your own and you can also provide a FilterIterator to move the decision whether a file should be listed or not out of the foreach loop.


The power of the RecursiveDirectoryIterator and RecursiveIteratorIterator combination comes out of its flexibility. What was not covered above are so called FilterIterators. I thought I add another example that is making use of two self-written of them, placed into each other to combine them.

  • One is to filter out all files and directories that start with a dot (those are considered hidden files on UNIX systems so you should not give that information to the outside) and
  • Another one that is filtering the list to files only. That is the check that previously was inside the foreach.

Another change in this usage example is to make use of the getSubPathname() function that returns the subpath starting from the iteration's rootpath, so the one you're looking for.

Also I explicitly add the SKIP_DOTS flag which prevents traversing . and .. (technically not really necessary because the filters would filter those as well as they are directories, however I think it is more correct) and return as paths as UNIX_PATHS so the strings of paths are always unix-like paths regardless of the underlying operating system Which is normally a good idea if those values are requested via HTTP later as in your case:

$rootpath = '.';

$fileinfos = new RecursiveIteratorIterator(
    new FilesOnlyFilter(
        new VisibleOnlyFilter(
            new RecursiveDirectoryIterator(
                $rootpath,
                FilesystemIterator::SKIP_DOTS
                    | FilesystemIterator::UNIX_PATHS
            )
        )
    ),
    RecursiveIteratorIterator::LEAVES_ONLY,
    RecursiveIteratorIterator::CATCH_GET_CHILD
);

foreach ($fileinfos as $pathname => $fileinfo) {
    echo $fileinfos->getSubPathname(), "\n";
}

This example is similar to the previous one albeit how the $fileinfos is build is a little differently configured. Especially the part about the filters is new:

    new FilesOnlyFilter(
        new VisibleOnlyFilter(
            new RecursiveDirectoryIterator($rootpath, ...)
        )
    ),

So the directory iterator is put into a filter and the filter itself is put into another filter. The rest did not change.

The code for these filters is pretty straight forward, they work with the accept function that is either true or false which is to take or to filter out:

class VisibleOnlyFilter extends RecursiveFilterIterator
{
    public function accept()
    {
        $fileName = $this->getInnerIterator()->current()->getFileName();
        $firstChar = $fileName[0];
        return $firstChar !== '.';
    }
}

class FilesOnlyFilter extends RecursiveFilterIterator
{
    public function accept()
    {
        $iterator = $this->getInnerIterator();

        // allow traversal
        if ($iterator->hasChildren()) {
            return true;
        }

        // filter entries, only allow true files
        return $iterator->current()->isFile();
    }
}

And that's it again. Naturally you can use these filters for other cases, too. E.g. if you have another kind of directory listing.

And another exemplary output with the $rootpath cut away:

test.html
test.rss
tests/test-pad-2.php
tests/test-pad-3.php
tests/test-pad-4.php
tests/test-pad-5.php
tests/test-pad-6.php
tests/test-pad.php
TLD/PSL/C/dkim-regdom.c
TLD/PSL/C/dkim-regdom.h
TLD/PSL/C/Makefile
TLD/PSL/C/punycode.pl
TLD/PSL/C/test-dkim-regdom.c
TLD/PSL/C/test-dkim-regdom.sh
TLD/PSL/C/tld-canon.h
TLD/PSL/generateEffectiveTLDs.php

No more .git or .svn directory traversal or listing of files like .builtpath or .project.


Note for FilesOnlyFilter and LEAVES_ONLY: The filter explicitly denies the use of directories and links based on the SplFileInfo object (only regular files that do exist). So it is a real filtering based on the file-system.
Another method to only get non-directory entries ships with RecursiveIteratorIterator because of the default LEAVES_ONLY flag (here used too in the examples). This flag does not work as a filter and is independent to the underlying iterator. It just specifies that the iteration should not return branchs (here: directories in case of the directory iterator).

Camilo Martin
  • 37,236
  • 20
  • 111
  • 154
hakre
  • 193,403
  • 52
  • 435
  • 836
  • AMAZING! Wow! ok how would I get just the pathname without the string(33) on it etc.. and how can I get just the file name too? :) – Papa De Beau Sep 02 '12 at 07:37
  • Also the path name minus the ./ right before it? :) – Papa De Beau Sep 02 '12 at 07:39
  • There are multiple ways to achieve that. You can "extend" this with some post-processing inside the `foreach`, the [`SplFileInfo`](http://php.net/SplFileInfo) objects are helpful here, you have those in `$fileinfo`. To remove the basedir is actually trivial because it is `$rootpath` plus the file-systems directory separator which length is normally exactly one character. So you have the `substr($pathname, 2)` here. I'll extend the answer with another filter example and will add that substring operation exemplary. – hakre Sep 02 '12 at 08:06
  • Very cool! And the names? I am guessing that should be easy too? wow man. Thank so much. You rock. – Papa De Beau Sep 02 '12 at 08:14
  • I got the minus ./ to work with this code: echo substr($pathname, 2); – Papa De Beau Sep 02 '12 at 08:20
  • 2
    The names, you mean only the filename? Well, as written, `SplFileInfo`, the function is called [`getFilename()`](http://php.net/splfileinfo.getfilename.php), in your case only the filename would be `$fileinfo->getFilename()`. For only the extension use [`$fileinfo->getExtension()`](http://www.php.net/splfileinfo.getextension.php) and so on and so forth. Every function you have with the `SplFileInfo` object you can make use of. That's why this is superior to `readdir` because you get these objects instead of dumb strings. – hakre Sep 02 '12 at 08:20
  • Would not that be the easiest way to do it? – Papa De Beau Sep 02 '12 at 08:20
  • Okay, I now found an even better method for the not-full pathname I didn't knew earlier: `$fileinfos->getInnerIterator()->getSubPathname()` that is the [`getSubPathname()`](http://www.php.net/recursivedirectoryiterator.getsubpathname.php) function. It does all the magic needed when you iterate. I'd say that is the function one is looking for. I will edit the answer and also remove the PHP 5.4 requirements. – hakre Sep 02 '12 at 10:09
  • Note to myself: [How does RecursiveIteratorIterator works in php](http://stackoverflow.com/a/12236744/367456) – hakre Sep 03 '12 at 02:20
5

If you are on linux and you don't mind executing a shell command, you can do this all in one line

$path = '/etc/php5/*'; // file filter, you could specify a extension using *.ext
$files = explode("\n", trim(`find -L $path`)); // -L follows symlinks

print_r($files);

Output:

Array (
       [0] => /etc/php5/apache2
       [1] => /etc/php5/apache2/php.ini
       [2] => /etc/php5/apache2/conf.d
       [3] => /etc/php5/apache2/conf.d/gd.ini
       [4] => /etc/php5/apache2/conf.d/curl.ini
       [5] => /etc/php5/apache2/conf.d/mcrypt.ini
       etc...
      )

The next shortest choice using only PHP is glob- but it doesn't scan sub-directories like you want. (you'd have to loop through the results, use is_dir() and then call your function again

http://us3.php.net/glob

$files = dir_scan('/etc/php5/*'); 
print_r($files);

function dir_scan($folder) {
    $files = glob($folder);
    foreach ($files as $f) {
        if (is_dir($f)) {
            $files = array_merge($files, dir_scan($f .'/*')); // scan subfolder
        }
    }
    return $files;
}

Every other way requires way more code then should be necessary to do something so simple

msEmmaMays
  • 1,073
  • 7
  • 7
  • 1
    :-) No problem. another cool thing about glob is you can specify a different filter (*.txt for example) and do all your file filtering at the same time (not only do you avoid having to parse every file name to check the extensions, you don't even have to loop over them at all since they are already filtered) – msEmmaMays Sep 02 '12 at 07:17
  • Is there a way to get it minus the ./ before in each array? Also can I get an array of just the file names with no path and ho extension? This is brilliant btw. – Papa De Beau Sep 02 '12 at 08:03
  • And when `$path` contains `./ && rm -r /` ? Perhaps I'm just a little paranoid about shelling out on a server but running any command with a variable dropped in is asking for problems - You _know_ that at some point, some new developer is going to come along and think "If only we could make that path user-supplied, we'd get [Blah Benefit]". No -1 as it's still a valid answer but I'd never do it – Basic Sep 05 '12 at 16:07
  • If you want to strip the './' you'd have to loop through the array and clean it up (or use array_walk) - or you could pass the full path to glob/find (it's giving './' because you are searching a relative path) @Basic - use escapeshellarg() to pass user supplied paths to shell and it's not so bad - http://php.net/manual/en/function.escapeshellarg.php – msEmmaMays Sep 05 '12 at 16:18
  • @RobertMaysJr [Sometimes](http://www.sektioneins.com/en/advisories/advisory-032008-php-multibyte-shell-command-escaping-bypass-vulnerability/index.html) [it is](http://www.net-security.org/vuln.php?id=3492) – Basic Sep 05 '12 at 16:22
2

Steps are as such:

and opendir will open the directory structure

$dh = opendir($dir)

what you do next is read whatever is there in $dh

$file = readdir($dh)

you can find all the info in the php manual corresponding to opendir

and googling for reading the structure returned this

http://www.codingforums.com/showthread.php?t=71882

Tanmay
  • 341
  • 2
  • 6
  • 22
  • Awesome. Thank you. looks great. Testing it now :) Will get back to you. – Papa De Beau Sep 02 '12 at 06:32
  • 1
    overly complicated for something so simple, you can do this in 5 lines of code (7 with brackets for readability) see my answer below for the exact code you need – msEmmaMays Sep 02 '12 at 07:03