I am on a mission to eliminate all (or as many as I can) allocations to the Large Object Heap as possible in my applications. One of the biggest offenders is our code that computes the MD5 hash of a large string.
public static string MD5Hash(this string s)
{
using (MD5CryptoServiceProvider csp = new MD5CryptoServiceProvider())
{
byte[] bytesToHash = Encoding.UTF8.GetBytes(s);
byte[] hashBytes = csp.ComputeHash(bytesToHash);
return Convert.ToBase64String(hashBytes);
}
}
Leave for the sake of the example that the string itself is probably already in the LOH. Our goal is to prevent more allocations to the heap.
Also, the current implementation assumes UTF8 encoding (a big assumption), but really the goal is to generate a byte[] from a string.
The MD5CryptoServiceProvider can take a Stream as input, so we can create a method:
public static string MD5Hash(this Stream stream)
{
using (MD5CryptoServiceProvider csp = new MD5CryptoServiceProvider())
{
return Convert.ToBase64String(csp.ComputeHash(stream));
}
}
This is promising because we don't need a byte[] for ComputeHash to work. We need a stream object that will read bytes from a string as bytes are requested by ComputeHash.
This rather controvesial question provides a method for creating a byte array from a string regardless of encoding. However, we want to avoid the creation of a large byte array.
This question provides a method of creating a stream from a string by reading the string into a MemoryStream, but internally that is just allocating a large byte[] array as well.
Neither really do the trick.
So how can you avoid the allocation of a large byte[]? Is there a Stream class that will read from another stream (or reader) as bytes are read?