1

I have a large folder structure which I am copying out, and I then want to run a comparison to ensure every file has copied into the destination folder.

The source structure I have would look something like this. There is a chance the same filename could exist in multiple folders.

  • C:\Temp\Source\Folder1\abc.txt
  • C:\Temp\Source\Folder2\abc.txt
  • C:\Temp\Source\Folder2\File1.txt

As a test, I purposely failed to copy abc.txt into Folder1 to the destination folder: C:\Temp\Destination\Folder1.

I have ran Compare-Object in Powershell

$Folder1List = Get-ChildItem 'C:\Temp\Source' -Recurse
$Folder2List = Get-ChildItem 'C:\Temp\Destination' -Recurse

Compare-Object $Folder1List $Folder2List

However, this brings out no results of differences, because abc.txt exists somewhere, just not necessarily in the folder(s) I expect it to be.

Is there anything to add in the Compare-Object which compares each folder rather than just at the root of Folder1 and Folder2? Or would I need to compare each sub-directory individually, which would be a pain as there can be thousands.

2 Answers2

2

I once faced a similar problem. This is the method I ended up with:

function Assert-FolderMatches {
    <#
    .SYNOPSIS
        Assert folder matches reference
    .DESCRIPTION
        Recursively compare folders ensuring that all files and folders have been copied correctly.
        If the file length doesn't match it will also raise an error.
    .PARAMETER Path
        Path to compare.
    .PARAMETER ReferencePath
        Path to compare with (the source folder for how it should look like).
    .PARAMETER AllowNewFiles
        Allow files to be present in Path which are not present in ReferencePath (like log files).
    #>
    Param(
        [Parameter(Mandatory = $true)]
        [string] $Path,

        [Parameter(Mandatory = $true)]
        [string] $ReferencePath,

        [Parameter()]
        [switch] $AllowNewFiles
    )

    $Path = Resolve-Path $Path
    $ReferencePath = Resolve-Path $ReferencePath

    # Identify all files and folders.
    function Get-ItemsForFolder($FolderPath) {
        Get-ChildItem -Path $FolderPath -Recurse | ForEach-Object { 
            $relativePath = $_.FullName.Replace($FolderPath, "")

            if ($relativePath.StartsWith('\') -or $relativePath.StartsWith('/')) {
                $relativePath = $relativePath.Substring(1)
            }

            if ($_ -is [System.IO.DirectoryInfo]) {
                [pscustomobject] @{
                    'RelativePath' = $relativePath;
                    'Length'       = $null;
                }
            } else {
                [pscustomobject] @{
                    'RelativePath' = $relativePath;
                    'Length'       = $_.Length;
                }
            }
        }
    }

    $files = Get-ItemsForFolder $Path
    $matchFiles = Get-ItemsForFolder $ReferencePath

    # Compare items from both sides.
    $diff = Compare-Object -ReferenceObject $matchFiles `
        -DifferenceObject $files `
        -Property RelativePath,Length
    
    # Filter out additions from 'Path' side if requested.
    if ($AllowNewFiles) {
        $diff = $diff | Where-Object { $_.SideIndicator -ne '=>' }
    }

    return $diff
}

Not the most simple solution, maybe someone has a more elegant way to do this. Essentially I'm using relative paths to compare, instead of just the file name.

And here's an example output. So you can see that "b.txt" exists in both target and source folder, but in a different location and it's raised as a difference:

$> Assert-FolderMatches -ReferencePath C:\Temp\a -Path C:\Temp\b 

RelativePath Length SideIndicator
------------ ------ -------------
f2                  =>
f2\b.txt     0      =>
f1                  <=
f1\b.txt     0      <=

Just as a note, I'm not comparing actual file content. I tried to do this with hashing, but it was too slow for my use case. So I ended up just comparing file size, which seems a reasonable compromise for copying to me, unless you're really worried about file corruption in some way.

Thomas Glaser
  • 1,670
  • 1
  • 18
  • 26
0

To compare two directory trees in terms of corresponding subdirectory and file paths (only), you need to compare them by their relative paths, which is what Get-ChildItem's -Name switch does:

# -Name ensures that *relative paths* are returned.
$Folder1List = Get-ChildItem -Name 'C:\Temp\Source' -Recurse
$Folder2List = Get-ChildItem -Name 'C:\Temp\Destination' -Recurse

Compare-Object $Folder1List $Folder2List

Note that -Name causes just the path strings to be returned, not the usual [System.IO.FileInfo] and [System.IO.DirectoryInfo] instances, so the .InputObject property of Compare-Object's output objects will contain only these path strings (you can use Get-Item -LiteralPath on those strings to get the aforementioned types).


As for what you tried:

  • When Compare-Object compares objects that whose types do not implement the [System.IComparable] interface, they are compared by their .ToString() values, which is true of the objects that Get-ChildItem outputs.

  • In Windows PowerShell, FileInfo and DirectoryInfo instances situationally - such as with your commands - stringify by their file name only, as detailed in this answer.

    • Therefore, even files and directories located in different subdirectories compare the same if their names match, as you've experienced.
  • By contrast, in PowerShell (Core) 7+ they now consistently stringify by their full path (.FullName property).

    • Therefore, given that the start of all paths differs between the directory trees you're comparing, your code would report all paths as different.
mklement0
  • 382,024
  • 64
  • 607
  • 775