If I do:
<?php echo md5(file_get_contents("/path/to/file")) ?>
...will this always produce the same hash as:
<?php echo md5_file("/path/to/file") ?>
If I do:
<?php echo md5(file_get_contents("/path/to/file")) ?>
...will this always produce the same hash as:
<?php echo md5_file("/path/to/file") ?>
Yes they return the same:
var_dump(md5(file_get_contents(__FILE__)));
var_dump(md5_file(__FILE__));
which returns this in my case:
string(32) "4d2aec3ae83694513cb9bde0617deeea"
string(32) "4d2aec3ae83694513cb9bde0617deeea"
Edit:
Take a look at the source code of both functions: https://github.com/php/php-src/blob/master/ext/standard/md5.c (Line 47 & 76). They both use the same functions to generate the hash except that the md5_file()
function opens the file first.
2nd Edit:
Basically the md5_file()
function generates the hash based on the file contents, not on the file meta data like the filename. This is the same way md5sum
on Linux systems work.
See this example:
pr@testumgebung:~# echo foobar > foo.txt
pr@testumgebung:~# md5sum foo.txt
14758f1afd44c09b7992073ccf00b43d foo.txt
pr@testumgebung:~# mv foo.txt bar.txt
pr@testumgebung:~# md5sum bar.txt
14758f1afd44c09b7992073ccf00b43d bar.txt
md5_file
command just hashs the content of a file with md5.
If you refer to the old md5_file PHP implementation (but the principle is still the same) source :
function php_compat_md5_file($filename, $raw_output = false)
{
// ...
// removed protections
if ($fsize = @filesize($filename)) {
$data = fread($fh, $fsize);
} else {
$data = '';
while (!feof($fh)) {
$data .= fread($fh, 8192);
}
}
fclose($fh);
// Return
$data = md5($data);
if ($raw_output === true) {
$data = pack('H*', $data);
}
return $data;
}
So if you hash with md5
any string or content, you will always get the same result as md5_file
(for the same encoding and file content).
In that case, if you hash by md5 the content of a file with file_get_content()
or if you use md5_file
or even if you use md5
command with the same content as your file content, you will always get the same result.
By example, you could change the file name of a file, and for two different files, with the same content, they will produce the same md5 hash.
By example: Considering two files containing "stackoverflow" (without the quotes) named 1.txt and 2.txt
md5_file("1.txt");
md5_file("2.txt");
would output
73868cb1848a216984dca1b6b0ee37bc
You will have the exact same result if you md5("stackoverflow")
or if you md5(file_get_contents("1.txt"))
or md5(file_get_contents("1.txt")).
based on the file contents, not on the file metadata like the BOM or filename
That's not correct about BOM. BOM is a part of file content, you can see its three bytes in any non-unicode file editor.
Yes, I tried it for several times. In my case, result for:
<?php echo md5(file_get_contents("1.php")) ?>
<br/>
<?php echo md5_file("1.php") ?>
Produce output as:
660d4e394937c10cd1c16a98f44457c2
660d4e394937c10cd1c16a98f44457c2
Which seems equivalent on both lines.