2

How to calculate hash using streams while saving file to disk?

I don't want to: Save the file first and then load it from disk just to calculate the hash, at any point having to load the entire file into memory, use non async version of methods for which there is async counterpart or use API which is marked as obsolete in .NET 6 or higher.

This is what I have so far always getting "err"

  public async Task HashOnTheFly()
  {
    var path = "/tmp/a.txt";
    await SaveAsync(File.OpenRead(path), "/tmp/b.txt", default);

    async Task SaveAsync(Stream stream, string path, CancellationToken ct)
    {
      var sha512 = SHA512.Create();
      var fileName = Path.GetFileName(path);
      var destinationPath = Path.Combine("/tmp", fileName);
      await using var fileStream = File.Create(destinationPath);

      await using var cryptoStream = new CryptoStream(fileStream, sha512, CryptoStreamMode.Read);
      await stream.CopyToAsync(fileStream, ct);

      if (sha512?.Hash is { } computedHash)
        Console.WriteLine(computedHash);
      else
        Console.WriteLine("err");
    }
  }
dbc
  • 104,963
  • 20
  • 228
  • 340
Hnus
  • 912
  • 2
  • 9
  • 24
  • You can try computing hash block after block: `sha256.TransformBlock(...)`, `sha256.TransformFinalBlock(...)`, `result = sha256.Hash` – Dmitry Bychenko Aug 08 '22 at 22:24
  • 1
    For ultimate efficiency, you could probably implement a bifurcating `DoubleWriteStream` to which you give two streams to write to. It will then write to both streams at the same time. Your example is a bit contrived: you are reading and writing to the same file – Charlieface Aug 08 '22 at 22:33
  • @Charlieface I'll take a look, the writing to the same file was mistake made when I tried to turn my actual code into example. – Hnus Aug 08 '22 at 22:38
  • I edited my question in my actual code data for the file doesn't even come from file stream but file uploaded from browser sent to server, FileStream was easy stream to create quickly and confirm intended behavior. – Hnus Aug 08 '22 at 22:42

1 Answers1

4

You have a few bugs in your code:

  1. You never copy to the CryptoStream, you only copy to the underlying fileStream. So naturally no hash is ever calculated.

  2. You do not close the CryptoStream before attempting to determine the hash. The stream must be closed first to ensure all data is computed and flushed.

  3. Since you are computing your hash as you write, you must use CryptoStreamMode.Write not CryptoStreamMode.Read.

  4. SHA512 implements IDisposable and so should be disposed of via a using statement.

Thus your SaveAsync should be modified as follows:

async Task<byte []> SaveAsync(Stream stream, string path, CancellationToken ct)
{
    var fileName = Path.GetFileName(path);
    var destinationPath = Path.Combine("/tmp", fileName);

    using var sha512 = SHA512.Create();
    await using (var fileStream = File.Create(path, 0, FileOptions.Asynchronous))
    await using (var cryptoStream = new CryptoStream(fileStream, sha512, CryptoStreamMode.Write))
    {
        await stream.CopyToAsync(cryptoStream, ct);
    }
    
    if (sha512?.Hash is { } computedHash)
        Console.WriteLine(Convert.ToBase64String(computedHash));
    else
        Console.WriteLine("err");
    
    return sha512.Hash;
}

Notes:

  • As you are doing asynchronous I/O you may want to open the fileStream with FileOptions.Asynchronous to indicate that the file can be used for asynchronous reading and writing.

  • I modified your code to return the computed hash, as it seems logical to do so.

Demo fiddle here.

dbc
  • 104,963
  • 20
  • 228
  • 340