I want to synchronize two directories. And I use
file_get_contents($source) === file_get_contents($dest)
to compare two files. Is there any problem to do this?
I want to synchronize two directories. And I use
file_get_contents($source) === file_get_contents($dest)
to compare two files. Is there any problem to do this?
I would rather do something like this:
function files_are_equal($a, $b)
{
// Check if filesize is different
if(filesize($a) !== filesize($b))
return false;
// Check if content is different
$ah = fopen($a, 'rb');
$bh = fopen($b, 'rb');
$result = true;
while(!feof($ah))
{
if(fread($ah, 8192) != fread($bh, 8192))
{
$result = false;
break;
}
}
fclose($ah);
fclose($bh);
return $result;
}
This checks if the filesize is the same, and if it is it goes through the file step by step.
Use sha1_file()
instead. It's faster and works fine if you just need to see whether the files differ. If the files are large, comparing the whole strings to each other can be very heavy. As sha1_file()
returns an 40 character representation of the file, comparing files will be very fast.
You can also consider other methods like comparing filemtime
or filesize, but this will give you guaranteed results even if there's just one bit that's changed.
Ths will work, but is inherently more inefficient than calculating checksum for both files and comparing these. Good candidates for checksum algorithms are SHA1 and MD5.
if (sha1_file($source) == sha1_file($dest)) {
/* ... */
}
Check first for the obvious:
(add comparison of date, file name and other metadata to this obvious list if those are also not supposed to be similar).
When comparing content hashing sounds not very efficient like @Oli says in his comment. If the files are different they most likely will be different already in the beginning. Calculating a hash of two 50 Mb files and then comparing the hash sounds like a waste of time if the second bit is already different...
Check this post on php.net
. Looks very similar to that of @Svish but it also compares file mime-type
. A smart addition if you ask me.
Seems a bit heavy. This will load both files completely as strings and then compare.
I think you might be better off opening both files manually and ticking through them, perhaps just doing a filesize check first.
There isn't anything wrong with what you are doing here, accept it is a little inefficient. Getting the contents of each file and comparing them, especially with larger files or binary data, you may run into problems.
I would take a look at filetime (last modified) and filesize, and run some tests to see if that works for you. It should be all you need at a fraction of the computation power.
Something I noticed is there is a lack of the N! factor. In other words - to do the filesize() function you would first have to check every file against all of the other files. Why? What if the first file and the second file are different sizes but the third file is the same size.
So first - you need to get a list of all of the files you are going to work with If you want to do the filesize type of thing - then use the COMPLETE / string as the key for an array and then store the filesize() information. Then you sort the array so all files which are the same size are lined up. THEN you can check file sizes. However - this does not mean they really are the same - only that they are the same size.
You need to do something like the sha1_file() command and, like above, make an array where the keys are the / names are the keys and the values is the value returned. Sort those, and then just do a simple walk through the array storing the sha1_file() value to test against. So is A==B? Yes. Do any additional tests, then get rid of the SECOND file and continue.
Why am I commenting? I'm working on this same problem and I just found out my program did not work correctly. So now I'm going to go correct it using the sha1_file() function. :-)