0

I am trying to copare files and directories. I want to first ensure that all of the same files are there, and then do some type of comparrison without having to hash all of the files. Both folders have the same structures, and should be very similar. I am however afraid that someone has modified only one directory or folder. Any help is greatly appreciated.

$FirstPath = "\\Server001\c$\Files"
$SecondPath = "\\Server002\c$\Files"


$firstcomp = Get-ChildItem -Recurse –Path $FirstPath
$Secondcomp = Get-ChildItem -Recurse –Path $SecondPath

Compare-Object -ReferenceObject $firstcomp -DifferenceObject $Secondcomp

I am comparing approx 10 Gb of files.

I do not even know what is the best way to accomplish the 2nd part of this because I keep recieving errors for files that do not exist in the first part of this.

  • 1
    hashing the files is the robust way of doing this. – Santiago Squarzon Jun 28 '22 at 16:10
  • 1
    Only other thing I can think of besides hashing the files, is probably comparing length; wouldn't be as accurate though. – Abraham Zinala Jun 28 '22 at 16:18
  • 2
    See also: [Powershell Speed: How to speed up ForEach-Object MD5/hash check](https://stackoverflow.com/a/59916692/1701026) – iRon Jun 28 '22 at 16:25
  • 1
    Your question needs more information, both paths must have the same files or only pathB must have the same files as pathA ? Do the paths contain only files ? If not, do the paths must have the same file/folder structure ? – Santiago Squarzon Jun 28 '22 at 17:01
  • 1
    If you don't need to detect moved/renamed files, you don't need to calculate expensive hash values. Handle the easy cases first - files that have different size don't need to be processed any further. For files of equal size, compare them byte by byte and stop when the first difference is found. Looping over the bytes in a PowerShell loop would be _very_ slow though (slower than calculating hash values). Instead use the [`FileStream`](https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream) class to read files in chunks and use `Enumerable.SequenceEqual()` to compare chunks. – zett42 Jun 28 '22 at 17:27
  • 1
    I did a quick proof-of-concept and it turns out that this method is already 3x as fast as calculating MD5 hash values using `Get-HashCode`, when full files have to be compared (files are equal). It will be even faster when the differences are closer to the beginning of the files. My current code relies on a small part of inline C#, as my initial attempt in pure PowerShell code proofed to be even slower than `Get-HashCode`. – zett42 Jun 29 '22 at 07:44

0 Answers0