0

I would like to run a powershell script that can be supplied a directory name by the user and then it will check the directory, sub directories, and all file contents of those directories to compare if they are identical to each other. There are 8 servers that should all have identical files and contents. The below code does not appear to be doing what I intended. I have seen the use of Compare-Object, Get-ChildItem, and Get-FileHash but have not found the right combo that I am certain is actually accomplishing the task. Any and all help is appreciated!

$35 = "\\server1\"
$36 = "\\server2\"
$37 = "\\server3\"
$38 = "\\server4\"
$45 = "\\server5\"
$46 = "\\server6\"
$47 = "\\server7\"
$48 = "\\server8\"
do{
Write-Host "|1 : New   |"
Write-Host "|2 : Repeat|"
Write-Host "|3 : Exit  |"
$choice = Read-Host -Prompt "Please make a selection"
    switch ($choice){
        1{
            $App = Read-Host -Prompt "Input Directory Application"
        }
        2{
            #rerun
        }
    3{
        exit;       }
    }

$c35 = $35 + "$App" +"\*"
$c36 = $36 + "$App" +"\*"
$c37 = $37 + "$App" +"\*"
$c38 = $38 + "$App" +"\*"
$c45 = $45 + "$App" +"\*"
$c46 = $46 + "$App" +"\*"
$c47 = $47 + "$App" +"\*"
$c48 = $48 + "$App" +"\*"

Write-Host "Comparing Server1 -> Server2"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c36 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server3"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c37 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server4"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c38 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server5"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c45 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server6"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c46 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server7"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c47 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

Write-Host "Comparing Server1 -> Server8"
if((Get-ChildItem $c35 -Recurse | Get-FileHash | Select-Object Hash,Path).hash -eq (Get-ChildItem $c48 -Recurse | Get-FileHash | Select-Object Hash,Path).hash){"Identical"}else{"NOT Identical"}

} until ($choice -eq 3)

TFore
  • 3
  • 1
  • `-eq` doesn't compare arrays one-by-one as you intend. Instead it filters LHS array operand by RHS operand. Use `Compare-Object` to compare arrays. BTW, calculating all file hashes before comparison will be very very slow. A faster approach would be to calculate all file hashes only for first directory. For subsequent directories calculate one hash and then immediately compare with file having same relative path in first directory. If different, you don't need to calculate remaining hashes of directory. – zett42 Nov 08 '22 at 19:31
  • 1
    I'm betting robocopy could do this faster. – TheMadTechnician Nov 08 '22 at 19:48
  • You should put all your servers in one array, no reason for having them in separated variables – Santiago Squarzon Nov 08 '22 at 19:52
  • You're wanting to compare the folders/content of the path input everytime on each server? Are there any changes to the other folders during any period of time? – Metzli_Tonaltzintli Nov 08 '22 at 21:14
  • @zett42, could you provide some sample code of what you mean? – TFore Nov 09 '22 at 11:58
  • @Metzli_Tonaltzintli Yes, each server has the files deployed separately and not always at the same time. The intent of this script is to check if they are all identical. This would be the desired final state of the servers once other processes and testing has occurred. – TFore Nov 09 '22 at 12:00
  • @SantiagoSquarzon I’m newer to powershell so if you can please give an example I would be happy to incorporate it. – TFore Nov 09 '22 at 12:02
  • `compare (dir -r dir1) (dir -r dir2) -Property name,length,lastwritetime` – js2010 Nov 12 '22 at 14:55

2 Answers2

0

Here is an example function that tries to compare one reference directory against multiple difference directories efficiently. It does so by comparing the most easily available informations first and stopping at the first difference.

  • Get all relevant informations about files in reference directory once, including hashes (though this could be more optimized by getting hashes only if necessary).
  • For each difference directory, compare in this order:
    • file count - if different, then obviously directories are different
    • relative file paths - if not all paths from difference directory can be found in reference directory, then directories are different
    • file sizes - should be obvious
    • file hashes - hashes only need to be calculated if files have equal size
Function Compare-MultipleDirectories {
    param(
        [Parameter(Mandatory)] [string] $ReferencePath,
        [Parameter(Mandatory)] [string[]] $DifferencePath
    )

    # Get basic file information recursively by calling Get-ChildItem with the addition of the relative file path
    Function Get-ChildItemRelative {
        param( [Parameter(Mandatory)] [string] $Path )

        Push-Location $Path  # Base path for Get-ChildItem and Resolve-Path
        try { 
            Get-ChildItem -File -Recurse | 
                Select-Object FullName, Length, @{ n = 'RelativePath'; e = { Resolve-Path $_.FullName -Relative } }
        } finally { 
            Pop-Location 
        }
    }

    Write-Verbose "Reading reference directory '$ReferencePath'"

    # Create hashtable with all infos of reference directory
    $refFiles = @{}
    Get-ChildItemRelative $ReferencePath |
        Select-Object *, @{ n = 'Hash'; e = { (Get-FileHash $_.FullName -Algorithm MD5).Hash } } | 
        ForEach-Object { $refFiles[ $_.RelativePath ] = $_ }

    # Compare content of each directory of $DifferencePath with $ReferencePath
    foreach( $diffPath in $DifferencePath ) {
        Write-Verbose "Comparing directory '$diffPath' with '$ReferencePath'"
        
        $areDirectoriesEqual = $false
        $differenceType = $null

        $diffFiles = Get-ChildItemRelative $diffPath

        # Directories must have same number of files
        if( $diffFiles.Count -eq $refFiles.Count ) {

            # Find first different path (if any)
            $firstDifferentPath = $diffFiles | Where-Object { -not $refFiles.ContainsKey( $_.RelativePath ) } | 
                                               Select-Object -First 1

            if( -not $firstDifferentPath ) {

                # Find first different content (if any) by file size comparison
                $firstDifferentFileSize = $diffFiles |
                    Where-Object { $refFiles[ $_.RelativePath ].Length -ne $_.Length } |
                    Select-Object -First 1

                if( -not $firstDifferentFileSize ) {

                    # Find first different content (if any) by hash comparison
                    $firstDifferentContent = $diffFiles | 
                        Where-Object { $refFiles[ $_.RelativePath ].Hash -ne (Get-FileHash $_.FullName -Algorithm MD5).Hash } | 
                        Select-Object -First 1
                
                    if( -not $firstDifferentContent ) {
                        $areDirectoriesEqual = $true
                    }
                    else {
                        $differenceType = 'Content'
                    } 
                }
                else {
                    $differenceType = 'FileSize'
                }
            }
            else {
                $differenceType = 'Path'
            }
        }
        else {
            $differenceType = 'FileCount'
        }

        # Output comparison result
        [PSCustomObject]@{ 
            ReferencePath = $ReferencePath  
            DifferencePath = $diffPath  
            Equal = $areDirectoriesEqual  
            DiffCause = $differenceType 
        }
    }
}

Usage example:

# compare each of directories B, C, D, E, F against A
Compare-MultipleDirectories -ReferencePath 'A' -DifferencePath 'B', 'C', 'D', 'E', 'F' -Verbose

Output example:

ReferencePath DifferencePath Equal DiffCause
------------- -------------- ----- ---------
A             B               True 
A             C              False FileCount
A             D              False Path     
A             E              False FileSize 
A             F              False Content 

DiffCause column gives you the information why the function thinks the directories are different.

Note:

  • Select-Object -First 1 is a neat trick to stop searching after we got the first result. It is efficient because it doesn't process all input first and drop everything except first item, but instead it actually cancels the pipeline after the 1st item has been found.
  • Group-Object RelativePath -AsHashTable creates a hashtable of the file information so it can be looked up quickly by the RelativePath property.
  • Empty sub directories are ignored, because the function only looks at files. E. g. if reference path contains some empty directories but difference path does not, and the files in all other directories are equal, the function treats the directories as equal.
  • I've choosen MD5 algorithm because it is faster than the default SHA-256 algorithm used by Get-FileHash, but it is insecure. Someone could easily manipulate a file that is different, to have the same MD5 hash as the original file. In a trusted environment this won't matter though. Remove -Algorithm MD5 if you need more secure comparison.
zett42
  • 25,437
  • 3
  • 35
  • 72
  • That is an amazing response! I will look into that first thing tommorow and let you know how it goes. Thank you so much! – TFore Nov 09 '22 at 22:58
  • Hey @zett42, I tried running this and it looks like it is not working correctly. I created two identical folders in two different locations and it looks like the function is failing on PATH. These ar the two folders I had it point at for the test. "\\networklocation\userdir$\TFore\Test C\Test A" and "\\networklocation\userdir$\TFore\Test A". The folder Test A is a direct copy of the first one. Please let me know if I am doing something incorrectly. Thank you! – TFore Nov 11 '22 at 14:32
  • @TFore To be honest, I haven't tested the function with UNC paths, not even with absolute paths. So there might actually be a bug. Let me have a closer look in a few hours, when I can afford some time. – zett42 Nov 11 '22 at 15:01
  • @TFore I can reproduce the problem with PowerShell 5.1. It works for me only on PowerShell 7.3. I haven't figured out the cause of the problem on PS 5.1 yet (I'm a bit tired, will look closer tomorrow). – zett42 Nov 11 '22 at 23:14
  • @TFore I've found the problem and fixed the code for PS 5.1! Apparently `Group-Object -AsHashtable` works differently in PS 5.1 compared to PS 7.3. The code couldn't find any key in the `hashtable`, so it thought the paths were different. I still don't know what's wrong with `Group-Object` in PS 5.1, so I just removed it and create the hashtable directly. This works in both PS 5.1 and 7.3 now. I've also tested it with full paths, but not with UNC paths, though I don't expect any difficulties with UNC paths. – zett42 Nov 12 '22 at 11:43
0

A simple place to start:

compare (dir -r dir1) (dir -r dir2) -Property name,length,lastwritetime

You can also add -passthru to see the original objects, or -includeequal to see the equal elements. The order of each array doesn't matter without -syncwindow. I'm assuming all the lastwritetime's are in sync, to the millisecond. Don't assume you can skip specifying the properties to compare. See also Comparing folders and content with PowerShell

I was looking into calculated properties like for relative path, but it looks like you can't name them, even in powershell 7. I'm chopping off the first four path elements, 0..3.

compare (dir -r foo1) (dir -r foo2) -Property length,lastwritetime,@{e={($_.fullname -split '\\')[4..$_.fullname.length] -join '\'}}

length lastwritetime          ($_.fullname -split '\\')[4..$_.fullname.length] -join '\' SideIndicator
------ -------------          ---------------------------------------------------------- -------------
    16 11/12/2022 11:30:20 AM foo2\file2                                                 =>
    18 11/12/2022 11:30:20 AM foo1\file2                                                 <=
js2010
  • 23,033
  • 6
  • 64
  • 66