6

Let's say we have a loop like this:

foreach($entries as $entry){ // let's say this loops 1000 times
   if (file_exists('/some/dir/'.$entry.'.jpg')){
      echo 'file exists';
   }
}

I assume this has to access the HDD 1000 times and check if each file exists.

What about doing this instead?

$files = scandir('/some/dir/');
foreach($entries as $entry){ // let's say this loops 1000 times
   if (in_array($entry.'.jpg', $files)){
      echo 'file exists';
   }
}

Question 1: If this accesses the HDD once, then I believe it should be a lot faster. Am I right on this one?

However, what if I have to check sub-directories for a file, like this:

foreach($entries as $entry){ // let's say this loops 1000 times
   if (file_exists('/some/dir/'.$entry['id'].'/'.$entry['name'].'.jpg')){
      echo 'file exists';
   }
}

Question 2: If I want to apply the above technique (files in array) to check if the entries exist, how can I scandir() sub-directories into the array, so that I can compare the file existence using this method?

Frantisek
  • 7,485
  • 15
  • 59
  • 102
  • 1
    *I assume this has to access the HDD 1000 times and check if each file exists.* -> If you don't have a cache... – Veger Jan 25 '13 at 07:52
  • file_exists() is known to be rather slow. But what are you trying to develop exactly? Most of the time it's not a speed issue, but coding errors. – Alexandru Calin Jan 25 '13 at 07:56
  • 1
    You can speed up `in_array()` by doing `array_flip()` first and then use `isset()` instead. – Ja͢ck Jan 25 '13 at 08:20
  • Your question doesn't have a definitive answer; do you need to scan the folder many times? does it still matter when you cache it? how is `$entries` populated? etc. – Ja͢ck Jan 25 '13 at 08:30

2 Answers2

5

Im my opinion, I believe the scandir() will be faster as it only reads the directory once, in addition file_exists() is known to be quite slow.

Furthermore, you could use glob(). This will list all files in a directory that match a particular pattern. See here

Regardless of my opinion, you can run a simple script like so to test the speed:

<?php

// Get the start time
$time_start = microtime(true);

// Do the glob() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'glob()\' finished in ' . $time . 'seconds';

// Do the file_exists() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'file_exists()\' finished in ' . $time . 'seconds';

// Do the scandir() method here

// Get the finish time
$time_end = microtime(true);
$time = $time_end - $time_start;

echo '\'scandir()\' finished in ' . $time . 'seconds';

?>

Not sure how the above script will behave with the cache, you may have to separate the tests into separate files and run individually

Update 1

You could also implement the function memory_get_usage() to return the amount of memory currently allocated to the PHP script. You may find this useful. See here for more details.

Update 2

As for your second question, there are several ways you can list all files in a directory, including sub-directories. See the answers to this question:

Scan files in a directory and sub-directory and store their path in array using php

Community
  • 1
  • 1
Ben Carey
  • 16,540
  • 19
  • 87
  • 169
0

You can have a look here.

I modified the "question code" such as, you can do a fast check by,

<?php
   $start = microtime();
    //Your code
    $end = microtime();
    $result= $now-$then;
    echo $result;
?>

Personally i think scandir() will be faster than in_array().

Community
  • 1
  • 1
Vishnu R
  • 1,859
  • 3
  • 26
  • 45