54

I have a server with a lot of files inside various folders, sub-folders, and sub-sub-folders.

I'm trying to make a search.php page that would be used to search the whole server for a specific file. If the file is found, then return the location path to display a download link.

Here's what i have so far:

$root = $_SERVER['DOCUMENT_ROOT'];
$search = "test.zip";
$found_files = glob("$root/*/test.zip");
$downloadlink = str_replace("$root/", "", $found_files[0]);
if (!empty($downloadlink)) {
    echo "<a href=\"http://www.example.com/$downloadlink\">$search</a>";
} 

The script is working perfectly if the file is inside the root of my domain name... Now i'm trying to find a way to make it also scan sub-folders and sub-sub-folders but i'm stuck here.

Marcio Mazzucato
  • 8,841
  • 9
  • 64
  • 79
Winston Smith
  • 691
  • 2
  • 7
  • 11
  • http://stackoverflow.com/questions/8870731/scan-files-in-a-directory-and-sub-directory-and-store-their-path-in-array-using – open source guy Jun 18 '13 at 04:48
  • You mind have better luck using the `file_exists()` function. http://php.net/manual/en/function.file-exists.php **(or a mix of).** – Funk Forty Niner Jun 18 '13 at 04:58
  • doesn't tells me how to scan all sub-folders and sub-sobfolders for the file... – Winston Smith Jun 18 '13 at 05:04
  • True. Have you had a look at the link `messi fan` put up? Seems promising. I'm dabbling with it now, and it's showing me all files in starting folder and sub-folders, but not working the way you want it to. Plus, I've got both eyes in the same socket right; needing some sleep, very soon. – Funk Forty Niner Jun 18 '13 at 05:13

4 Answers4

92

There are 2 ways.

Use glob to do recursive search:

<?php
 
// Does not support flag GLOB_BRACE
function rglob($pattern, $flags = 0) {
    $files = glob($pattern, $flags); 
    foreach (glob(dirname($pattern).'/*', GLOB_ONLYDIR|GLOB_NOSORT) as $dir) {
        $files = array_merge(
            [],
            ...[$files, rglob($dir . "/" . basename($pattern), $flags)]
        );
    }
    return $files;
}

// usage: to find the test.zip file recursively
$result = rglob($_SERVER['DOCUMENT_ROOT'] . '/test.zip');
var_dump($result);
// to find the all files that names ends with test.zip
$result = rglob($_SERVER['DOCUMENT_ROOT'] . '/*test.zip');
?>

Use RecursiveDirectoryIterator

<?php
// $regPattern should be using regular expression
function rsearch($folder, $regPattern) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $regPattern, RegexIterator::GET_MATCH);
    $fileList = array();
    foreach($files as $file) {
        $fileList = array_merge($fileList, $file);
    }
    return $fileList;
}

// usage: to find the test.zip file recursively
$result = rsearch($_SERVER['DOCUMENT_ROOT'], '/.*\/test\.zip/'));
var_dump($result);
?>

RecursiveDirectoryIterator comes with PHP5 while glob is from PHP4. Both can do the job, it's up to you.

Martin.
  • 10,494
  • 3
  • 42
  • 68
Tony Chen
  • 1,804
  • 12
  • 13
  • 2
    ok but how can i use it to search for a specific file within folders/subfolders/subsubfolders and return the file's path ? – Winston Smith Jun 18 '13 at 05:29
  • 7
    rsearch: `var_dump(rsearch('/folder/.../', '/.*zip/'));` rglob: `var_dump(rglob('/folder/*/test.zip'));` it returns an array of matched files. – Tony Chen Jun 18 '13 at 05:45
  • can't get it to work... i tried with var_dump(rsearch('/', 'test.zip')); and also with var_dump(rsearch('$root', 'test.zip')); ... could you update your post with a code that works with my example in OP ? I want to search all folders and sub-folders for test.zip – Winston Smith Jun 18 '13 at 06:40
  • @WinstonSmith It does work. if you use `rsearch`, $pattern param is regular expression, which is why in my example there 2 slashes wrap around. You can use rglob, which accept wildcard parameter. – Tony Chen Jun 18 '13 at 06:45
  • tried var_dump(rsearch('$root', '/test.zip/')); doesnt work neither... your example search for all zip files, i want to search for a specific file (test.zip in the example, but it can also be somefile.rar or whatever.mp3) – Winston Smith Jun 18 '13 at 07:14
  • use: var_dump(rsearch($root, '/.*\/test.zip/')); DON"T use single quotes around $root. – Tony Chen Jun 18 '13 at 07:37
  • great ! got it to work now... but the script takes 30 seconds to execute itself :( Is it normal that it is so slow ? I actually have around 2000 files on the server. It is a dedicated server (atom single-core 1.2ghz with 2gb of memory)... The server is currently not open to public so there is no traffic and no server load, and it doesnt host any other sites... i guess i will need some sort of caching – Winston Smith Jun 18 '13 at 08:03
  • ok i fixed the problem with the server, it now executes way faster. But i have another problem, after replacing the function call by rsearch($root, '/.*\/'.$search.'/') where $search is a GET value, if i look for a file with parenthesis (Example = test(test)test[test].zip) it wont return any result even if the file is on the server. – Winston Smith Jun 18 '13 at 08:52
  • [] and () are special characters in regular expression, see here: http://www.regular-expressions.info/reference.html _A backslash escapes special characters to suppress their special meaning._ special characters includes: [\^$.|?*+(){}, e.g. test\\[test\\]\\(test\\) – Tony Chen Jun 18 '13 at 08:59
  • 1
    I did some anecdotal tests on a deep directory structure and the rsearch function was an order of magnitude faster... – JasonRDalton May 28 '14 at 00:37
  • Not a right answer. for example `rsearch('/folder/', '/.*mp3/')` will also match a file named `folder/mp3/album/file.mp3` , but returns 'folder/mp3' as a filename... – Alex Oct 20 '15 at 11:35
  • I wish I could say the OO solution is cleaner, but it seems excessively verbose and more difficult to understand. What the heck is a `RecursiveIteratorIterator`? – Lincoln Bergeson Jul 20 '17 at 21:26
  • `rglob` - not the best function.... can't find files with given extension as it applies it to directories as well ... there is a better function on `php.net` - `glob()` comments. Called `glob_recursive` afair. – Flash Thunder Nov 03 '17 at 13:53
  • 2
    @JasonRDalton, retested, with PHP 7.1 ("anecdotally", too :) ), on a 60MB project tree (with two mid-size git worktrees and lots of other small files etc.), and got the exact opposite. Had a priming run for both right before the measurement, and I pretty consistently got numbers like: `rglob`: 0.02864, `rsearch`: 0.12413. Which is a lot more plausible, actually, than the other way around, I'd say. – Sz. Feb 13 '18 at 17:58
  • Thanks. I used the one with RecursiveDirectoryIterator. I needed to prepend ```\``` to those class names. And I renamed `$pattern` to `$regexPattern = '/.*/'`. – Ryan Feb 20 '20 at 16:10
  • 1
    I suggest you edit your answer to include `rsearch($root, '/.*\/test.zip/'));` to save newbies a lot of wasted time scanning through comments that are not visible by default. Apart from that, this is a nice answer. – Wonko the Sane May 22 '21 at 11:57
  • Thank you. I use it with the pattern `/.+\.[a-z]+/i` to get all files with an ending. Sidenote: using array_merge inside a loop is resource greedy. – aProgger Jan 15 '22 at 11:41
39

I want to provide another simple alternative for cases where you can predict a max depth. You can use a pattern with braces listing all possible subfolder depths.

This example allows 0-3 arbitrary subfolders:

glob("$root/{,*/,*/*/,*/*/*/}test_*.zip", GLOB_BRACE);

Of course the braced pattern could be procedurally generated.

Pinke Helga
  • 6,378
  • 2
  • 22
  • 42
  • Just be aware that GLOB_BRACE isn't available on all platforms. I only discovered that when my code failed in an automated pipeline. – coatesap Jun 24 '21 at 15:31
  • Or for multiple file types(for example .pdf,.mp4 and .mp3): glob("$root/{*.pdf,*/*.pdf,*/*/*.pdf,*/*/*/*.pdf,*.mp4,*/*.mp4,*/*/*.mp4,*/*/*/*.mp4,*.mp3,*/*.mp3,*/*/*.mp3,*/*/*/*.mp3}", GLOB_BRACE) – HosseinNedaee Jul 06 '21 at 06:11
  • @HosseinNedaee Multiple types would be expressed with a brace pattern as well: `"$root/{,*/,*/*/,*/*/*/}test_*.{zip,gz,tgz}"` – Pinke Helga Jul 05 '22 at 12:50
11

This returns fullpath to the file

function rsearch($folder, $pattern) {
    $iti = new RecursiveDirectoryIterator($folder);
    foreach(new RecursiveIteratorIterator($iti) as $file){
         if(strpos($file , $pattern) !== false){
            return $file;
         }
    }
    return false;
}

call the function:

$filepath = rsearch('/home/directory/thisdir/', "/findthisfile.jpg");

And this is returns like:

/home/directory/thisdir/subdir/findthisfile.jpg

You can improve this function to find several files like all jpeg file:

function rsearch($folder, $pattern_array) {
    $return = array();
    $iti = new RecursiveDirectoryIterator($folder);
    foreach(new RecursiveIteratorIterator($iti) as $file){
        if (in_array(strtolower(array_pop(explode('.', $file))), $pattern_array)){
            $return[] = $file;
        }
    }
    return $return;
}

This can call as:

$filepaths = rsearch('/home/directory/thisdir/', array('jpeg', 'jpg') );

Ref: https://stackoverflow.com/a/1860417/219112

Community
  • 1
  • 1
Sadee
  • 3,010
  • 35
  • 36
  • 4
    Probably should use `$file->getExtension ()` rather than `array_pop(explode('.', $file))` to avoid "PHP Notice: Only variables should be passed by reference in ...". – Simon Nuttall Dec 11 '16 at 23:54
  • @Sadee Thanks for the function it's working well for my project. The only thing i would add is a die in case the folder path doesn't exist so it don't bother moving forward.- – Michael Rogers Oct 30 '17 at 12:35
  • 2
    You may want use `yield` instead of build a complete a `$return` array. This will produce a [generator](https://www.php.net/manual/en/language.generators.syntax.php) and improve performances a lot. – alexandre-rousseau May 22 '19 at 09:42
7

As a full solution for your problem (this was also my problem):

<?php
function rsearch($folder, $pattern) {
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $pattern, RegexIterator::MATCH);


    foreach($files as $file) {
         yield $file->getPathName();
    }
}

Will get you the full path of the items that you wish to find.

Edit: Thanks to Rousseau Alexandre for pointing out , $pattern must be regular expression.

metzelder
  • 655
  • 2
  • 15
  • 35