4

I have a program which is going to be used on very large files (current test data is 250GB). I need to be able to calculate both MD5 and SHA1 hashes for these files. Currently my code drops the stream into MD5.Create().ComputeHash(Stream stream), and then the same for SHA1. These, as far as I can tell, read the file in 4096-byte blocks to a buffer internal to the hashing function, until the end of the stream.

The problem is, doing this one after the other takes a VERY long time! Is there any way I can take data into a buffer and provide the buffer to BOTH algorithms before reading a new block into the buffer?

Please explain thoroughly as I'm not an experienced coder.

Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
  • Read it blockwise and feed the data to your own digest algorithms in tandem – sehe Feb 15 '13 at 22:41
  • Check : http://stackoverflow.com/questions/14610850/how-to-get-file-both-md5-and-sha1-checksum-at-the-same-time-when-upload-a-new-fi (java) – punkeel Feb 15 '13 at 22:42
  • possible duplicate of http://stackoverflow.com/questions/7832440/is-hashalgorithm-computehash-stateful – Marius Bancila Feb 15 '13 at 22:43

1 Answers1

12

Sure. You can call TransformBlock repeatedly, and then TransformFinalBlock at the end and then use Hash to get the final hash. So something like:

using (var md5 = MD5.Create()) // Or MD5Cng.Create
using (var sha1 = SHA1.Create()) // Or SHA1Cng.Create
using (var input = File.OpenRead("file.data"))
{
    byte[] buffer = new byte[8192];
    int bytesRead;
    while ((bytesRead = input.Read(buffer, 0, buffer.Length()) > 0)
    {
        md5.TransformBlock(buffer, 0, bytesRead, buffer, 0);
        sha1.TransformBlock(buffer, 0, bytesRead, buffer, 0);
    }
    // We have to call TransformFinalBlock, but we don't have any
    // more data - just provide 0 bytes.
    md5.TransformFinalBlock(buffer, 0, 0, buffer, 0);
    sha1.TransformFinalBlock(buffer, 0, 0, buffer, 0);

    byte[] md5Hash = md5.Hash;
    byte[] sha1Hash = sha1.Hash;
}

The MD5Cng.Create and SHA1Cng.Create calls will create wrappers around native implementations which are likely to be faster than the implementations returned by MD5.Create and SHA1.Create, but which will be a bit less portable (e.g. for PCLs).

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Perfect :) I just tested it and the output agrees with the hash produced by my reference tool (FTK Imager) so I'm all happy! – Joash Lewis Feb 16 '13 at 16:45
  • If you're computed two hashes at once, it's because you want the code to run fast. As such, you likely should use MD5Cng.Create() and SHA1Cng.Create(), assuming you don't support Windows XP. See: http://stackoverflow.com/questions/5341874/which-one-to-use-managed-vs-nonmanaged-hashing-algorithms – 0xdabbad00 Oct 15 '14 at 02:22
  • 1
    @0xdabbad00: Also assuming you're not interested in portable class libraries, store apps etc... but I'll add a note. – Jon Skeet Oct 15 '14 at 05:45