2

I'm trying to get hashes for 200 files, size is different for all of them (from 100 bytes to 10GB).

The main problem im facing, that md5 is not working for file, size of which is greater than 3GB, just giving me OutOfMemoryException.

So what im trying to do, is to Hash one file, than hash another, than another (something like if private bool GenerateHash(String Path) is busy, than wait, if not than continue, and also i want to be able to hash file which size is greater than 4GB (My System Specs is 4930k and 32GB Ram).

I've done it on linux via Terminal, got all the hashes, but unable to do same thing on Windows.

Currently moving all my stuff from server to home PC, and don't want to download same files or files which are bigger (checking hash and size)

Any suggestions?

Update: Here is the code to hash file (Compiling as x32 and running x64 Box)

public void HashFile(String FPath)
    {
        using (var md5 = MD5.Create())
        {
            using (var stream = File.OpenRead(FPath))
            {
                String ComputedHash = BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", String.Empty).ToLower();

                WriteToFile(FPath + "  " + ComputedHash);
            }
        }
    }
Ivan Zhivolupov
  • 1,107
  • 2
  • 20
  • 39
  • How are you hashing your file? Please show some code. There's no reason for `System.Security.Cryptography.MD5` to throw an `OutOfMemoryException` if you use it properly (i.e streaming the blocks). – user703016 Apr 09 '14 at 17:02
  • Please tell us you're compiling the application as x64 and running it on an x64 box. –  Apr 09 '14 at 17:06
  • The thing is, that i need to be able to run it on 32bit as well (( – Ivan Zhivolupov Apr 09 '14 at 17:07
  • The hash function may be replaced with anything, but this 'anything' should help me to verify if files are the same – Ivan Zhivolupov Apr 09 '14 at 17:11
  • 2
    I managed to successfully hash a 5GB file using the code given above in 32 bit mode and the process memory usage remained constant during the process. Either the code above isn't the code you're actually using, or the `OutOfMemoryException` isn't being thrown where you think it is. – Iridium Apr 09 '14 at 17:24

1 Answers1

1

You should be using TransformFinalBlock and TransformBlock So that you aren't reading the entire thing into memory.

Source: here and here (nice example here too)

Community
  • 1
  • 1
Icemanind
  • 47,519
  • 50
  • 171
  • 296
  • You can add an example of using in the your answer, just in case, if the original reference answer will not be available. – Anatoliy Nikolaev Apr 09 '14 at 17:17
  • 1
    The overload OP is using `ComputeHash(Stream)` already does that. – user703016 Apr 09 '14 at 17:17
  • @presiuslitelsnoflek -- `ComputeHash(Stream)` does NOT break it into chunks. It loads the entire stream into memory and then does its work. You can force it to work in the way you describing by using memory mapped files though. – Icemanind Apr 09 '14 at 17:23
  • Feel free to look at the source. – user703016 Apr 09 '14 at 17:25
  • 1
    No, it does not load the whole stream into memory, it hashes in 4KB chunks (at least in the .NET version I'm using). – Iridium Apr 09 '14 at 17:25
  • 1
    Looking [here](http://referencesource.microsoft.com/#mscorlib/system/security/cryptography/hashalgorithm.cs#5b02a2a217146fcf), I can see you are correct. I stand corrected. – Icemanind Apr 09 '14 at 17:31
  • An easier way to use this stuff is to create an instance of a CryptoStream with an MD5 instance and just copy your data (FileStream or whatever) into the CryptoStream. – Henning Krause Aug 19 '15 at 19:33