0

I need calculate md5 for each file in UNC folder (\\192.168.1.3\ABC). The problem is this folder has large number of files (~2000 files) This code below take me 2,5 hours to finish.

 foreach (var file in filesInFolder)
 {
    using (var md5 = MD5.Create())
     {
       using (var stream = File.OpenRead(file))
        {
           var md5Check = BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", "‌​").ToLower();
           dicMD5[file] =md5Check;
        }
     }
 }

if ABC is local folder it takes about 5 mins to complete this code above. I think i need some help for better approach. Please help me thanks alot

  • Possible duplicate of [What is the fastest way to create a checksum for large files in C#](https://stackoverflow.com/questions/1177607/what-is-the-fastest-way-to-create-a-checksum-for-large-files-in-c-sharp) – Richardissimo Oct 12 '18 at 04:27

2 Answers2

0

So since it runs locally in about 5 minutes, the issue probably has to do with reading the files over the network. The best way to speed it up would be to put the program locally where the files are at, and have it communicate results to another machine if need be. I realize there are times when that wont work, so that limits what you can do.

One thing you can do though is multi-thread that call to read all the files.

var maxThreads = 8;

Parallel.ForEach(filesInFolder, new ParallelOptions { MaxDegreeOfParallelism = maxThreads }, file => { 
    using (var md5 = MD5.Create()) {
        using (var stream = File.OpenRead(file)) {
            var md5Check = BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", "‌​").ToLower();
            dicMD5[file] = md5Check;
        }
    }
});            

This will run everything in parallel, limited by the maxThreads variable.

I used this code to compute the hash of 17k files in just over 2 minutes. So while this may still be slower over the network, it should be much faster then what you are currently doing. Just make sure you set maxThreads to a value appropriate to your machine.

NOTE: You will probably want to make your dicMD5 a ConcurrentDictionary. That is found here System.Collections.Concurrent

Michael Sharp
  • 496
  • 3
  • 10
0

How long does it take to copy the files to your local machine? If it taken 2.5 hours, then there is no inefficiency and there's nothing you can do except try to run the code on the remote machine.

If the copy takes less than 2.5 hours, then you know there is inefficiency somewhere in the process--for example, buffers are too small or data is being fetched repeatedly. If that's the case, the easiest solution is to copy each file to a local temp directory, then do the checksum. If you want this to run as quickly as possible, use a thread which copies the files and one or more threads which compute the checksums, so you never have to wait for checksums to complete before copying the next file.

piojo
  • 6,351
  • 1
  • 26
  • 36