2

I am trying to get a random file from all subfolders in a folder.

First I get an iterator of all files, using this code:

$path = "/path/to/folder";

$folder = new RecursiveDirectoryIterator($path);
$iterator = new RecursiveIteratorIterator($folder);

$files = new RegexIterator($iterator,
                           '/^.+\.(jpg|jpeg|png|gif)$/i',
                           RecursiveRegexIterator::GET_MATCH);

This appears to work (and finishes in a split second). Now I want to get a random item from the resulting iterator. I use this code (this is line 14):

$image = array_keys(iterator_to_array($files))[mt_rand(0,
                                           count(iterator_to_array($files)) - 1)];

The folder contains 334327 objects, and, after executing for a couple of seconds, iterator_to_array() dies with the following error:

Fatal error: Allowed memory size of 134217728 bytes exhausted
             (tried to allocate 1232 bytes) in /script.php on line 14

How do I need to change my code to avoid PHP running out of memory? Or is there a better way to grab a random item from such a huge array? (Or maybe it is even possible to grab a random file from all subfolders, directly?)

I do not want to override the memory limit!

The number of files changes constantly.

Community
  • 1
  • 1
  • 1
    Are you sure you need to convert the iterator into an array? If all you want is one random file from the iterator, and you know beforehand the total number of files, can't you just compute a random index, iterate through all of the files (without storing them in an array), and return once you reach the index? – T. Silver Mar 19 '16 at 20:01
  • @T.Silver The number of files is not constant. Also, I don't know how to get an item from the iterator without converting it into an array. –  Mar 19 '16 at 20:38

3 Answers3

1

Okay, so what I'm doing now – and it works – is not convert the iterator to an array, but rather count the items in the iterator, calculate a random number, and then loop over the iterator until I reach the item with that number:

$path = "/path/to/folder";

$folder = new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS);
$iterator = new RecursiveIteratorIterator($folder);

$files = new RegexIterator($iterator, '/^.+\.(jpg|jpeg|png|gif)$/i', RecursiveRegexIterator::GET_MATCH);

$i = mt_rand(0, iterator_count($files) - 1);

$c = 0;
foreach($files as $file) {
    if ($i == $c) {
        $image = $file[0];
        break;
    }
    $c++;
}

Is I said, this works now, but:

  1. the counting takes about 10 seconds, so I would be happy to abbreviate this somehow; and

  2. the foreach-loop takes a couple of seconds, too, so I would be very happy, if I could directly retrieve an element from an iterator by number, but I couldn't find any examples on how to do that.

So, if you have an idea about how to solve 1 or 2, I would be grateful.

  • I think that is the best solution you can do with an iterator. The only thing you could change/improve is when you want multiple random elements, to divide the amount of elements with the amount of random elements you want and then always get one random number between 0 and the result of the division. – Rizier123 Mar 20 '16 at 13:28
0

Sounds like you are looking for a function that picks a random element from an array. If the array dimensions are known/constant you can iterate PHPs array_rand() function.

However, if your array dimensions are now known and change depending on the input (for example, a different location in the file tree), then things get more complicated. Luckily somebody answered this question 10 years ago and posted a code snipped in the comments of the PHP manual for array_rand():

<?php
/**
* Returns a number of random elements from an array.
*
* It returns the number (specified in $limit) of elements from
* $array. The elements are returned in a random order, exactly
* as it was passed to the function. (So, it's safe for multi-
* dimensional arrays, aswell as array's where you need to keep
* the keys)
*
* @author Brendan Caffrey  <bjcffnet at gmail dot com>
* @param  array  $array  The array to return the elements from
* @param  int    $limit  The number of elements to return from
*                            the array
* @return array  The randomized array
*/
function array_rand_keys($array, $limit = 1) {
    $count = @count($array)-1;

    // Sanity checks
    if ($limit == 0 || !is_array($array) || $limit > $count) return array();
    if ($count == 1) return $array;

    // Loop through and get the random numbers
    for ($x = 0; $x < $limit; $x++) {
        $rand = rand(0, $count);

        // Can't have double randoms, right?
        while (isset($rands[$rand])) $rand = rand(0, $count);

        $rands[$rand] = $rand;
    }

    $return = array();
    $curr = current($rands);

    // I think it's better to return the elements in a random
    // order, which is why I'm not just using a foreach loop to
    // loop through the random numbers
    while (count($return) != $limit) {
        $cur = 0;

        foreach ($array as $key => $val) {
            if ($cur == $curr) {
                $return[$key] = $val;

                // Next...
                $curr = next($rands);
                continue 2;
            } else {
                $cur++;
            }
        }
    }

    return $return;
}
?>
Yosyp Schwab
  • 161
  • 1
  • 4
  • If I understand correctly, for this code to work I need to convert my iterator to an array before I can apply it. And it is this conversion that runs out of memory. –  Mar 19 '16 at 20:44
0

As i understand your script search all subfolders, if overall count 334327 is count of files+folders then you need to separate your sript to get random path then get random file in path. In other way you can build graph of all subfolders and files and choose random point in graph.

Nick Nikolaev
  • 141
  • 1
  • 7
  • I understand what you mean, but the folder structure and the number of files in them changes. –  Mar 20 '16 at 07:04