How can I detect in C# whether two files are absolutely identical (size, content, etc.)?
2 Answers
Here's a simple solution, which just reads both files and compares the data. It should be no slower than the hash method, since both methods will have to read the entire file. EDIT As noted by others, this implementation is actually somewhat slower than the hash method, because of its simplicity. See below for a faster method.
static bool FilesAreEqual( string f1, string f2 )
{
// get file length and make sure lengths are identical
long length = new FileInfo( f1 ).Length;
if( length != new FileInfo( f2 ).Length )
return false;
// open both for reading
using( FileStream stream1 = File.OpenRead( f1 ) )
using( FileStream stream2 = File.OpenRead( f2 ) )
{
// compare content for equality
int b1, b2;
while( length-- > 0 )
{
b1 = stream1.ReadByte();
b2 = stream2.ReadByte();
if( b1 != b2 )
return false;
}
}
return true;
}
You could modify it to read more than one byte at a time, but the internal file stream should already be buffering the data, so even this simple code should be relatively fast.
EDIT Thanks for the feedback on speed here. I still maintain that the compare-all-bytes method can be just as fast as the MD5 method, since both methods have to read the entire file. I would suspect (but don't know for sure) that once the files have been read, the compare-all-bytes method requires less actual computation. In any case, I duplicated your performance observations for my initial implementation, but when I added some simple buffering, the compare-all-bytes method was just as fast. Below is the buffering implementation, feel free to comment further!
EDIT Jon B makes another good point: in the case where the files actually are different, this method can stop as soon as it finds the first different byte, whereas the hash method has to read the entirety of both files in every case.
static bool FilesAreEqualFaster( string f1, string f2 )
{
// get file length and make sure lengths are identical
long length = new FileInfo( f1 ).Length;
if( length != new FileInfo( f2 ).Length )
return false;
byte[] buf1 = new byte[4096];
byte[] buf2 = new byte[4096];
// open both for reading
using( FileStream stream1 = File.OpenRead( f1 ) )
using( FileStream stream2 = File.OpenRead( f2 ) )
{
// compare content for equality
int b1, b2;
while( length > 0 )
{
// figure out how much to read
int toRead = buf1.Length;
if( toRead > length )
toRead = (int)length;
length -= toRead;
// read a chunk from each and compare
b1 = stream1.Read( buf1, 0, toRead );
b2 = stream2.Read( buf2, 0, toRead );
for( int i = 0; i < toRead; ++i )
if( buf1[i] != buf2[i] )
return false;
}
}
return true;
}

- 44,214
- 4
- 43
- 69
-
3What I particularly like about this is that you'll catch a binary difference early on when comparing large files of the same length. – Jon B Oct 17 '08 at 17:58
-
@Tessaract got it! – Charlie Jul 08 '22 at 01:44