I've built a file copying routine into a common library for a variety of different (WinForms) applications I'm currently working on. What I've built implements the commonly-used CopyFileEx
method to actually perform the file copy while displaying the progress, which seems to be working great.
The only real issue I'm encountering is that, because most of the file copying I'm doing is for archival purposes, once the file is copied, I would like to "verify" the new copy of the file. I have the following methods in place to do the comparison/verification. I'm sure many of you will quickly see where the "problem" is:
Public Shared Function CompareFiles(ByVal File1 As IO.FileInfo, ByVal File2 As IO.FileInfo) As Boolean
Dim Match As Boolean = False
If File1.FullName = File2.FullName Then
Match = True
Else
If File.Exists(File1.FullName) AndAlso File.Exists(File2.FullName) Then
If File1.Length = File2.Length Then
If File1.LastWriteTime = File2.LastWriteTime Then
Try
Dim File1Hash As String = HashFileForComparison(File1)
Dim File2Hash As String = HashFileForComparison(File2)
If File1Hash = File2Hash Then
Match = True
End If
Catch ex As Exception
Dim CompareError As New ErrorHandler(ex)
CompareError.LogException()
End Try
End If
End If
End If
End If
Return Match
End Function
Private Shared Function HashFileForComparison(ByVal OriginalFile As IO.FileInfo) As String
Using BufferedFileReader As New IO.BufferedStream(File.OpenRead(OriginalFile.FullName), 1200000)
Using MD5 As New System.Security.Cryptography.MD5CryptoServiceProvider
Dim FileHash As Byte() = MD5.ComputeHash(BufferedFileReader)
Return System.Text.Encoding.Unicode.GetString(FileHash)
End Using
End Using
End Function
This CompareFiles()
method checks a few of the "simple" elements first:
- Is it trying to compare a file to itself? (if so, always return
True
) - Do both files actually exist?
- Are the two files the same size?
- Do they both have the same modification date?
But, you guessed it, here's where the performance takes the hit. Especially for large files, the MD5.ComputeHash
method of the HashFileForComparison()
method can take a while - about 1.25 minutes for a 500MB file for a total of about 2.5 minutes to compute both hashes for the comparison. Does anyone have a better suggestion for how to more efficiently verify the new copy of the file?