14

For a file containing few bytes under Linux, I need only to process when it was changed since the last time it was processed. I check whether the file was changed by calling PHP clearstatcache(); filemtime(); periodically. Since the entire file will always be tiny, would it be a performance improvement to remove the call to filemtime and check for a file change by comparing the contents with the past contents? Or what is the best method for that, in terms of performance.

doc_id
  • 1,363
  • 13
  • 41
  • 2
    I think very, very hardly. `filemtime()` accesses low-level system functions that are always going to beat actually opening it. Interested to hear what the filesystem/OS experts say – Pekka May 01 '11 at 17:49
  • probably depends on the OS & the filesystem type. profile both and see which one works better in your specific setting. – Mat May 01 '11 at 17:49
  • 3
    @Mat - I can't think of a filesystem that returns the contents of a file faster than the metadata ... and if there is one, I don't think I'd want to use it. – Brian Roach May 01 '11 at 17:52
  • @Brian: both the data & metadata will be cached if the file is accessed often - the time difference between just copying the metadata to user-space vs copying a few bytes of the file data is probably hard to measure on modern system. If it really is tiny, the comparison could be just as cheap as comparing to timestamps - even potentially cheaper if the timestamps are 64bit long on a 32bit system. (but syscall overhead could dominate, so...) – Mat May 01 '11 at 17:57
  • @Mat - oh I agree, the question in and of itself is silly given the immeasurable differences. However, if the file changes, then the cache would need to be updated. And I'd wager the seek time on the drive cancels out any issue dealing with a 64 bit number. This conversation has now used more time than any "optimization" here would save over years :-D – Brian Roach May 01 '11 at 18:02
  • @Brian: if your OS doesn't keep cache the updated page in cache, I don't want to use it :-) – Mat May 01 '11 at 18:03
  • If the file is small, on most filesystems the metadata will be kept beside the content. If it's not in the cache, neither is the metadata, and the seek time will apply anyway. But seriously, this should be optimized when need be. – Emil Vikström May 01 '11 at 18:23
  • In fact what led me to ask is the fact that PHP caches the stat of files upon checking, that implies the cost of of querying the file system for such few bytes. This discussion leads me to ask what possible difference this internal cache might achieve. The filesystem in question is Linux Ext2/3 – doc_id May 02 '11 at 22:45

4 Answers4

18

Use filemtime + clearstatcache

To enhance @Ben_D's test:

<?php

$file = 'small_file.html';
$loops = 1000000;

// filesize (fast)
$start_time = microtime(1);
for ($i = 0; $i < $loops; $i++) {
    $file_size = filesize($file);
}
$end_time = microtime(1);
$time_for_file_size = $end_time - $start_time;

// filemtime (fastest)
$start_time = microtime(1);
for ($i = 0; $i < $loops; $i++) {
    $file_mtime = filemtime($file);
}
$end_time = microtime(1);
$time_for_filemtime = $end_time - $start_time;

// filemtime + no cache (fast and reliable)
$start_time = microtime(1);
for ($i = 0; $i < $loops; $i++) {
    clearstatcache();
    $file_mtime_nc = filemtime($file);
}
$end_time = microtime(1);
$time_for_filemtime_nc = $end_time - $start_time;

// file_get_contents  (slow and reliable)
$start_time = microtime(1);
for ($i = 0; $i < $loops; $i++) {
    $file_contents = file_get_contents($file);
}
$end_time = microtime(1);
$time_for_file_get_contents = $end_time - $start_time;

// output
echo "
<p>Working on file '$file'</p>
<p>Size: $file_size B</p>
<p>last modified timestamp: $file_mtime</p>
<p>file contents: $file_contents</p>

<h1>Profile</h1>
<p>filesize: $time_for_file_size</p>
<p>filemtime: $time_for_filemtime</p>
<p>filemtime + no cache: $time_for_filemtime_nc</p>
<p>file_get_contents: $time_for_file_get_contents</p>";

/* End of file */

Results

Geo
  • 12,666
  • 4
  • 40
  • 55
  • 12
    Please note that you only have to call `clearstatcache();` if you need fresh `filemtime` information from the same file multiple times **during the same request** (and if the possibility is given that the file may be modified during the request). `filemtime` cache gets lost after a request has been completed. – TiMESPLiNTER Oct 27 '14 at 08:26
8

I know I'm late to the party, but a little benchmarking never hurt a discussion. Brian Roach's intuition proves sounds, even before you take into account the comparison step:

The Test:

$file = "small_file.html";
$file_size = filesize($file);

//get the filemtime 1,000,000 times
$start_time = microtime(true);
for($i=0;$i<1000000;$i++){
    $set_time = filemtime($file);
}
$end_time = microtime(true);

$time_for_filemtime = ($end_time-$start_time);

//get the time for file_get_contents 1,000,000 times
$start_time = microtime(true);
$file = "small_file.html";
for($i=0;$i<1000000;$i++){
    $set_time = file_get_contents($file);
}
$end_time = microtime(true);

$time_for_file_get_contents = ($end_time-$start_time);

echo "<p>Working on a file that is $file_size B long</p>
<p>filemtime: $time_for_filemtime vs file_get_contents: $time_for_file_get_contents";

The Results

Working on a file that is 41 B long

filemtime: 0.36287999153137 vs file_get_contents: 16.191468000412

No shocker: "asking the file system for some metadata" is faster than "opening the file, reading it in, and comparing the contents."

Ben D
  • 14,321
  • 3
  • 45
  • 59
4

To stat the file, you're simply asking the file system for some metadata.

Your second approach involves opening the file, reading it in, and comparing the contents.

Which do you think would be faster? ;)

Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • 1
    That makes sense, but there is more to ask about this regard, like if the filesystem is optimized for file reads instead of metadata. Add to this the cost of the cache operation itself for stats by PHP. – doc_id May 02 '11 at 22:50
  • Another factor to consider. Using the 1st approach, I will eventually have to read the content when a modification is detected by comparing modification time. I remember something called FileSystemWatch somewhere but don't really remember that. – doc_id May 03 '11 at 00:03
3

I think the best method to be notified about changes to a file is inotify, which is designed for exactly this purpose.

See the inotify extension.

Borealid
  • 95,191
  • 9
  • 106
  • 122