I`m looking for solution for hashing large file content (files may be over 2gb in 32bit os). It there any easy solution for that? Or just reading by part and loading to buffer?
3 Answers
Driis's solution sounds more flexible, but HashAlgorithm.ComputeHash
will also accept Stream
s as parameters.

- 47,023
- 20
- 101
- 166
-
+1, if one does not care about block size etc, this overload of `ComputeHash` certainly is the simplest one to use. I'll leave my answer below for the sake of completeness. – driis May 23 '11 at 08:08
-
1Use this one if you don't care about progress reporting. Otherwise, use a CryptoStream with a SHA-1 instance.... – Henning Krause Aug 19 '15 at 19:31
Use TransformBlock
and TransformFinalBlock
to calculate the hash block by block, so you won't need to read the entire file into memory. (There is a nice example in the first link - and another one in this previous question).
-
+1; sounds like the right approach, but wow, that is a truly useless MSDN page. Perhaps there's a better link? – Eamon Nerbonne May 23 '11 at 07:57
-
Reading the code example on that page should be enough to get the idea. What do you think is missing ? – driis May 23 '11 at 08:00
-
Well, the example contradicts the parameter descriptions for one - inputCount is described as "The number of bytes in the input byte array to use as data.", but used more like an endOffset. – Eamon Nerbonne May 23 '11 at 08:02
-
Hmm, maybe that's intentional - it *is* a count, they're just illustrating corner cases? – Eamon Nerbonne May 23 '11 at 08:03
-
It's not clear what the `TransformBlock` function is for, but it seems clear that `ComputeHash` is the proper function for this purpose. – Gabe May 23 '11 at 08:06
-
Another great one is `outputBuffer`: *A copy of the part of the input array used to compute the hash code.* Which part? This is used in the example, but there's it not a copy but as an aliased reference. What exactly does this do? (The unclear documentation isn't your fault, and I'll bet TransformBlock *will* do the job, it's just that I'd like to understand a method's spec before using it). – Eamon Nerbonne May 23 '11 at 08:40
-
1@Eamonm, you are right, that documentation is quite unclear (I might have been biased from having used the method before). – driis May 23 '11 at 08:42
-
So, confirmation that the spec is worthless; in the example on MSDN, the line `offset += sha.TransformBlock(input, offset, size, input, offset);` may be safely replaced by `offset += sha.TransformBlock(input.Skip(offset).Take(size).ToArray(), 0, size, null, -24124512);` - the last two arguments are apparently *ignored*. – Eamon Nerbonne May 23 '11 at 08:46
-
Last comment (sorry for the outburst!) I've had these MSDN experiences before, opened connect issues; got little to no feedback except "we'll fix it", and then years (no joke!) later, the docs are updated in a way that *still* doesn't quite cover it. Several times. It's just frustrating. – Eamon Nerbonne May 23 '11 at 08:49
If you choose to use TransformBlock
, then you can safely ignore the last parameter and set the outputBuffer to null
. TransformBlock will copy from the input to the output array - but why would you want to simply copy bits for no good reason?
Furthermore, all mscorlib HashAlgorithms work as you might expect, i.e. the block size doesn't seem to affect the hash output; and whether you pass the data in one array and then hash in chunks by changing the inputOffset
or you hash by passing smaller, separate arrays doesn't matter. I verified this using the following code:
(this is slightly long, just here so people can verify for themselves that HashAlgorithm
implementations are sane).
public static void Main() {
RandomNumberGenerator rnd = RandomNumberGenerator.Create();
byte[] input = new byte[20];
rnd.GetBytes(input);
Console.WriteLine("Input Data: " + BytesToStr(input));
var hashAlgoTypes = Assembly.GetAssembly(typeof(HashAlgorithm)).GetTypes()
.Where(t => typeof(HashAlgorithm).IsAssignableFrom(t) && !t.IsAbstract);
foreach (var hashType in hashAlgoTypes)
new AlgoTester(hashType).AssertOkFor(input.ToArray());
}
public static string BytesToStr(byte[] bytes) {
StringBuilder str = new StringBuilder();
for (int i = 0; i < bytes.Length; i++)
str.AppendFormat("{0:X2}", bytes[i]);
return str.ToString();
}
public class AlgoTester {
readonly byte[] key;
readonly Type type;
public AlgoTester(Type type) {
this.type=type;
if (typeof(KeyedHashAlgorithm).IsAssignableFrom(type))
using(var algo = (KeyedHashAlgorithm)Activator.CreateInstance(type))
key = algo.Key.ToArray();
}
public HashAlgorithm MakeAlgo() {
HashAlgorithm algo = (HashAlgorithm)Activator.CreateInstance(type);
if (key != null)
((KeyedHashAlgorithm)algo).Key = key;
return algo;
}
public byte[] GetHash(byte[] input) {
using(HashAlgorithm sha = MakeAlgo())
return sha.ComputeHash(input);
}
public byte[] GetHashOneBlock(byte[] input) {
using(HashAlgorithm sha = MakeAlgo()) {
sha.TransformFinalBlock(input, 0, input.Length);
return sha.Hash;
}
}
public byte[] GetHashMultiBlock(byte[] input, int size) {
using(HashAlgorithm sha = MakeAlgo()) {
int offset = 0;
while (input.Length - offset >= size)
offset += sha.TransformBlock(input, offset, size, input, offset);
sha.TransformFinalBlock(input, offset, input.Length - offset);
return sha.Hash;
}
}
public byte[] GetHashMultiBlockInChunks(byte[] input, int size) {
using(HashAlgorithm sha = MakeAlgo()) {
int offset = 0;
while (input.Length - offset >= size)
offset += sha.TransformBlock(input.Skip(offset).Take(size).ToArray()
, 0, size, null, -24124512);
sha.TransformFinalBlock(input.Skip(offset).ToArray(), 0
, input.Length - offset);
return sha.Hash;
}
}
public void AssertOkFor(byte[] data) {
var direct = GetHash(data);
var indirect = GetHashOneBlock(data);
var outcomes =
new[] { 1, 2, 3, 5, 10, 11, 19, 20, 21 }.SelectMany(i =>
new[]{
new{ Hash=GetHashMultiBlock(data,i), Name="ByMSDN"+i},
new{ Hash=GetHashMultiBlockInChunks(data,i), Name="InChunks"+i}
}).Concat(new[] { new { Hash = indirect, Name = "OneBlock" } })
.Where(result => !result.Hash.SequenceEqual(direct)).ToArray();
Console.Write("Testing: " + type);
if (outcomes.Any()) {
Console.WriteLine("not OK.");
Console.WriteLine(type.Name + " direct was: " + BytesToStr(direct));
} else Console.WriteLine(" OK.");
foreach (var outcome in outcomes)
Console.WriteLine(type.Name + " differs with: " + outcome.Name + " "
+ BytesToStr(outcome.Hash));
}
}

- 47,023
- 20
- 101
- 166