I need to calculate MD5 checksum for many large files. The code for this is pretty simple:
System.IO.FileStream file = new System.IO.FileStream(strFullPath, FileMode.Open);
fsFile.Seek(1000, SeekOrigin.Begin); //skip some chars if need be
System.Security.Cryptography.MD5 md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
byte[] arrBtMd5 = md5.ComputeHash(fsFile);
The problem starts, if I want to do one of the following:
- Calculate several hash functions for the same file (md5,sha1,crc32 and-what-not).
- Calculate MD5 for entire file and another MD5 for the same file with some header rows skipped.
If I do this one by one, the same file will be read multiple times. Disk I/O is a bottleneck of the system, so my questions are:
- Can .NET compiler/framework recognize that I read the same file multiple times and optimize the operation? (I'm pretty sure it does something, because when I added second md5 calculation without headers, the impact was not that great).
- What technique can I use to be sharing same FileStream between multiple "consumers"? I'd like to skim a file only once with a FileStream and split the data for use by hashing functions working paralelly.