1

I need to compare all the files contained within a directory and its subdirectories against all the other files contained within that same directory and its subdirectories and log the paths of matching files to text file or a CSV.

I realize there are software tools to do this, but unless it is avaialable out-of-the-box with Windows, I won't be allowed to use it on my network.

This topic discusses using the binary flag on the file comparison tool in the command prompt. The problem with this script is that it uses matching filenames to execute the binary comparisons; I.E.it looks for "File 1" in both directories that are being compared. If "File 1" is not present in both directories, the comparison does not occur.

I need the comparison to ignore filenames and just brute force compare the current file against all other files. It needs to then move on to the next file.

I am not quite experienced enough with either powershell or command line scripting to get this working recursively and could not find an example.

I understand that comparing even 300 files amongst themselves could take a considerable amount of time, but I plan on letting this thing run in the background over the course of a weekend.

Community
  • 1
  • 1
Shrout1
  • 2,497
  • 4
  • 42
  • 65
  • 1
    Like to comment that the solution given is great! The syntax I used looks like `fciv c:\Test -r -XML c:\users\shrout1\desktop\md5.xml`. This works recursively and creates an XML file that can be opened by Excel. Then just use `Conditional Formatting > Highlight Cells Rules > Duplicate Values`. – Shrout1 Nov 27 '13 at 19:03
  • Oh and the line given in the comment above needs to be entered into the command prompt. – Shrout1 Nov 27 '13 at 19:11

1 Answers1

2

Just a hint really.... rather than comparing every file against every other, I would generate the MD5 checksum of each file once and compare the checksums...

This may help.. FCIV

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • 4
    Taking this a step further, inspect the file sizes first. If file sizes don't match, don't even bother with a checksum. That should speed up execution significantly. When you do have to compute the checksum, do it [with .NET methods](http://stackoverflow.com/questions/10520048/calculate-md5-checksum-for-a-file), not an external program. – alroc Nov 27 '13 at 16:48
  • Both valid points; can I use FCIV to generate checksums for all files in a directory *and* it's subdirectories? I'm searching for the answer but figured it might be faster to ask :) – Shrout1 Nov 27 '13 at 17:04
  • @alroc FYI v4 has get Get-FileHash built-in and http://pscx.codeplex.com has had Get-Hash for years. – Keith Hill Nov 27 '13 at 17:57
  • @KeithHill Noted, but OP stated that he can't use anything that doesn't ship with Windows (ruling out PSCX). So unless they've upgraded their environment to v4 (unlikely), he'll have to use a method like what I linked. – alroc Nov 27 '13 at 17:59
  • @KeithHill Thank you for the suggestion but it's OOTB on this one only! Anyone know if FCIV is already on the Server 2008 disk? I'll go look it up.... – Shrout1 Nov 27 '13 at 18:58
  • That tools seems to have issues with _MAX_PATH. For long directories it causes buffer overrurns. – Alois Kraus Feb 26 '18 at 10:36