18

How can I detect in C# whether two files are absolutely identical (size, content, etc.)?

Stephen Booher
  • 6,522
  • 4
  • 34
  • 50

2 Answers2

30

Here's a simple solution, which just reads both files and compares the data. It should be no slower than the hash method, since both methods will have to read the entire file. EDIT As noted by others, this implementation is actually somewhat slower than the hash method, because of its simplicity. See below for a faster method.

static bool FilesAreEqual( string f1, string f2 )
{
    // get file length and make sure lengths are identical
    long length = new FileInfo( f1 ).Length;
    if( length != new FileInfo( f2 ).Length )
        return false;

    // open both for reading
    using( FileStream stream1 = File.OpenRead( f1 ) )
    using( FileStream stream2 = File.OpenRead( f2 ) )
    {
        // compare content for equality
        int b1, b2;
        while( length-- > 0 )
        {
            b1 = stream1.ReadByte();
            b2 = stream2.ReadByte();
            if( b1 != b2 )
                return false;
        }
    }

    return true;
}

You could modify it to read more than one byte at a time, but the internal file stream should already be buffering the data, so even this simple code should be relatively fast.

EDIT Thanks for the feedback on speed here. I still maintain that the compare-all-bytes method can be just as fast as the MD5 method, since both methods have to read the entire file. I would suspect (but don't know for sure) that once the files have been read, the compare-all-bytes method requires less actual computation. In any case, I duplicated your performance observations for my initial implementation, but when I added some simple buffering, the compare-all-bytes method was just as fast. Below is the buffering implementation, feel free to comment further!

EDIT Jon B makes another good point: in the case where the files actually are different, this method can stop as soon as it finds the first different byte, whereas the hash method has to read the entirety of both files in every case.

static bool FilesAreEqualFaster( string f1, string f2 )
{
    // get file length and make sure lengths are identical
    long length = new FileInfo( f1 ).Length;
    if( length != new FileInfo( f2 ).Length )
        return false;

    byte[] buf1 = new byte[4096];
    byte[] buf2 = new byte[4096];

    // open both for reading
    using( FileStream stream1 = File.OpenRead( f1 ) )
    using( FileStream stream2 = File.OpenRead( f2 ) )
    {
        // compare content for equality
        int b1, b2;
        while( length > 0 )
        {
            // figure out how much to read
            int toRead = buf1.Length;
            if( toRead > length )
                toRead = (int)length;
            length -= toRead;

            // read a chunk from each and compare
            b1 = stream1.Read( buf1, 0, toRead );
            b2 = stream2.Read( buf2, 0, toRead );
            for( int i = 0; i < toRead; ++i )
                if( buf1[i] != buf2[i] )
                    return false;
        }
    }

    return true;
}
Charlie
  • 44,214
  • 4
  • 43
  • 69
1

Or you can compare the two files byte-for-byte....

yfeldblum
  • 65,165
  • 12
  • 129
  • 169