I currently am up against an issue with having to hash files that will cause too much memory pressure and I'm trying to find out if we can create a hash on the fly with a file stream.
While researching possibilities, I decided to write a quick little test and make sure that the MD5's ComputeHash returns the same hash between the method calls that take a string and a stream.
let CreateMD5HashFromString (value: string) =
Convert.ToBase64String(MD5.Create().ComputeHash(Encoding.ASCII.GetBytes(value)))
let CreateMD5HashFromStream (value: Stream) =
Convert.ToBase64String(MD5.Create().ComputeHash(value))
I'm testing the calls with the following unit test:
[<TestMethod>]
member this.``CreateMD5Hash is the same between a string and a file stream`` () =
let sampleText = File.ReadAllText("Sample.txt")
let textMD5 = Security.CreateMD5HashFromString(sampleText);
let streamMD5 = Security.CreateMD5HashFromStream(File.OpenRead("Sample.txt"))
Assert.AreEqual(textMD5, streamMD5)
It's reading a small sample file for the test. This test fails because the generated hashes are different. To me this seems incorrect, but not exactly sure. Does anyone know for sure if these should be the same?
Also, secondary question, am I saving myself memory issues by using the stream overload of ComputeHash or does it load the entire stream before hashing? I tried to dissasemble the related .NET Assembly, but get lost trying to track down what HashCore does under the hood.