3

Possible Duplicate:
Possible to calculate MD5 (or other) hash with buffered reads?

I am performing an md5 checksum on the entire raw contents of a usb flash drive.

I am reading the drive in 1 MB chunks. Obviously I am not keeping the buffer around for very long as I would run out of memory very quickly.

I would like to send the 1 MB chunk to the md5 algorithm as they are read and when I'm done reading get the final value of the md5.

Is there any C# code that can easily accomplish this?

Community
  • 1
  • 1
user1090205
  • 91
  • 1
  • 2
  • 5

2 Answers2

1

This method will compute the hash of input reading BufferSize bytes at a time:

static byte[] CalculateHash(Stream input, HashAlgorithm algorithm)
{
    byte[] buffer = new byte[BufferSize];
    int readCount;

    while ((readCount = input.Read(buffer, 0, BufferSize)) > 0)
        algorithm.TransformBlock(buffer, 0, readCount, buffer, 0);
    algorithm.TransformFinalBlock(buffer, 0, readCount);

    return algorithm.Hash;
}

Note that it takes a parameter of type HashAlgorithm, so you can calculate hashes other than just MD5. Call it like this:

using (FileStream inputStream = new FileStream(InputPath, FileMode.Open))
using (MD5 algorithm = MD5.Create())
{
    byte[] md5Hash = CalculateHash(inputStream, algorithm);
    string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));

    // Process hash array or hex string...
}
Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
  • Your example doesn't appear to work for me, I'm getting the same MD5 hash even if I change some of the data in the "Stream input". – user1090205 Feb 24 '12 at 18:44
  • I tried it with a handful of files (even a zero-byte file) and it works for me, with the hashes confirmed by a third-party utility. Are you hashing each file on your drive, or ignoring the filesystem and somehow reading the device as one big stream of bytes? – Lance U. Matthews Feb 24 '12 at 19:01
0

This seems to be a little different than what you are attempting, but I have done something similar when transferring large files across the wire.

First I build a manifest file, which is a list of each chunk of the file to be sent, it's order, and it's md5 hash. I send that file across to the receiving process.

Then I start sending chunks of file, the receiver, gets them, validates their hash, and queues them up to be reassembled. If a chunk doesn't match it's hash, it is discarded, and a request is made for that chunk to be passed again.

Once all the chunks are received and validated, the file is reassembled, and the complete file is checked against the complete file hash in the manifest. If everything looks good, we send a successful response back to the sender.

This lets me deal with, and validate just pieces of a file at a time, but it does not let me build a hash from the hash values of smaller sources. This I don't believe is possible, just by the nature of the hashing algorithms.

Edit

And +1 for @ken2k's comment on your question. md5.TransformBlock() and md5.TransformFinalBlock() are probably exactly what you are looking for. I had no idea such a thing existed.

Matthew Vines
  • 27,253
  • 7
  • 76
  • 97