13

All, There is a application which generates it's export dumps.I need to write a script that will compare the previous days dump against the latest and if there are differences among them i have to some basic manipulations of moving and deleting sort of stuff. I have tried finding a suitable way of doing it and the method i tried was : $var_com=diff (get-content D:\local\prodexport2 -encoding Byte) (get-content D:\local\prodexport2 -encoding Byte) I tried the Compare-Object cmdlet as well. I notice a very high memory usage and eventually i get a message System.OutOfMemoryException after few minutes. Has one of you done something similer ?. Some thoughts please. There was a thread which mentioned about a has comparison which i have no idea as to how to go about. Thanks in advance folks Osp

user2967267
  • 147
  • 1
  • 1
  • 10
  • Do you need to know which bytes are different, or just that today's file is not the same as yesterdays? – mjolinor Nov 14 '13 at 23:54
  • just need to know if they are different. As you have quoted i need to know if the files are the same or not. – user2967267 Nov 15 '13 at 00:01
  • Have a look at the answers [here](http://stackoverflow.com/q/211008/1324345). It's marked C# but since it's .NET, it can be ported to PowerShell syntax. The easiest thing to do is compare file sizes first - if those are different, you already have your answer. – alroc Nov 15 '13 at 01:08
  • If use `-Raw` parameter of `Get-Content` without any `-Encoding`, comparison wents much faster and easier. – Serhii Kheilyk May 31 '23 at 05:26

5 Answers5

25

With PowerShell 4 you can use native commandlets to do this:

function CompareFiles {
    param(
    [string]$Filepath1,
    [string]$Filepath2
    )
    if ((Get-FileHash $Filepath1).Hash -eq (Get-FileHash $Filepath2).Hash) {
        Write-Host 'Files Match' -ForegroundColor Green
    } else {
        Write-Host 'Files do not match' -ForegroundColor Red
    }
}

PS C:> CompareFiles .\20131104.csv .\20131104-copy.csv

Files Match

PS C:> CompareFiles .\20131104.csv .\20131107.csv

Files do not match

You could easily modify the above function to return a $true or $false value if you want to use this programmatically on a large scale


EDIT

After seeing this answer, I just wanted to supply larger scale version that simply returns true or false:

function CompareFiles 
{
    param
    (
        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 1st file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file1,

        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 2nd file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file2
    )

    ( Get-FileHash $file1 ).Hash -eq ( Get-FileHash $file2 ).Hash
}
Community
  • 1
  • 1
ericnils
  • 251
  • 3
  • 3
13

You could use fc.exe. It comes with Windows. Here's how you would use it:

fc.exe /b d:\local\prodexport2 d:\local\prodexport1 > $null
if (!$?) {
    "The files are different"
}
Keith Hill
  • 194,368
  • 42
  • 353
  • 369
  • 2
    I might be inclined to not use the `if (!$?)` and replace it with `if ($LastExitCode -eq 0)`. Check out http://stackoverflow.com/q/10666101 and all the answers. – Code Maverick Nov 21 '14 at 20:59
  • 1
    This is extremely slow for different files, because it prints all differences (to null). It seems fc does not support not printing output. One can use 'fc /a /b ' which might try to output less but didn't make big difference for me. – arberg Dec 23 '15 at 20:01
  • Just out of curiosity does it help to assign to $null e.g. `$null = fc.exe ...`? – Keith Hill Dec 27 '15 at 06:57
8

Another method is to compare the MD5 hashes of the files:

$Filepath1 = 'c:\testfiles\testfile.txt'
$Filepath2 = 'c:\testfiles\testfile1.txt'

$hashes = 
foreach ($Filepath in $Filepath1,$Filepath2)
{
 $MD5 = [Security.Cryptography.HashAlgorithm]::Create( "MD5" )
 $stream = ([IO.StreamReader]"$Filepath").BaseStream
 -join ($MD5.ComputeHash($stream) | 
 ForEach { "{0:x2}" -f $_ })
 $stream.Close()
 }

if ($hashes[0] -eq $hashes[1])
  {'Files Match'}
mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • Thanks for this. It took away the long time it used to take for the comparison. – user2967267 Nov 15 '13 at 03:19
  • I tried using this code with relative paths (so in Powershell `cd somewhere` and then `$FilePath1 = 'testfile.txt'`) but the StreamReader doesn't pick up Powershell's change of folder and thinks it is relative to my home folder instead. The fix is to use `$Filepath1 = Get-Item 'testfile.txt'` instead and then Powershell passes the correct absolute path to StreamReader. – Duncan Mar 19 '14 at 10:21
  • 1
    Powershell's Get-FileHash function is (now) available, and does the same thing more simply. – NoBrassRing May 08 '19 at 15:58
8

A while back I wrote an article on a buffered comparison routine to compare two files with PowerShell:

function FilesAreEqual {
    param(
        [System.IO.FileInfo] $first,
        [System.IO.FileInfo] $second, 
        [uint32] $bufferSize = 524288) 

    if ($first.Length -ne $second.Length) return $false

    if ( $bufferSize -eq 0 ) $bufferSize = 524288

    $fs1 = $first.OpenRead()
    $fs2 = $second.OpenRead()

    $one = New-Object byte[] $bufferSize
    $two = New-Object byte[] $bufferSize
    $equal = $true

    do {
        $bytesRead = $fs1.Read($one, 0, $bufferSize)
        $fs2.Read($two, 0, $bufferSize) | out-null

        if ( -Not [System.Linq.Enumerable]::SequenceEqual($one, $two)) {
            $equal = $false
        }

    } while ($equal -and $bytesRead -eq $bufferSize)

    $fs1.Close()
    $fs2.Close()

    return $equal
}

You can use it by:

FilesAreEqual c:\temp\test.html c:\temp\test.html

A hash (like MD5) needs to traverse the entire file to do the hash calculation. This script returns as soon at it sees a difference in the buffer. It compares the buffer using LINQ which is faster than native PowerShell.

Community
  • 1
  • 1
Kees C. Bakker
  • 32,294
  • 27
  • 115
  • 203
  • How would your routine compare with the **@ericnils** [answer](http://stackoverflow.com/a/24765192/682480) with respect to performance? When using it inside a function that could get called from a `foreach` that contains however many files of varying sizes, is yours more optimized than the 4.0 `Get-FileHash`? – Code Maverick Nov 21 '14 at 21:04
  • @CodeMaverick, it should be for exactly the reason he stated. it doesn't have to read both entire files unless they are the same. It's the ideal solution – Nacht Feb 19 '16 at 06:12
  • 1
    I suggest setting `$BYTES_TO_READ` to some higher value than 8. On my system reading 8 Bytes per iteration was extremely slow. I don't know what the best value is, but increasing the buffer size to 32768 (32 KB) certainly made the file compare a lot snappier. – herzbube Aug 11 '17 at 13:04
  • I realized that changing `$BYTES_TO_READ` is not enough, because inside the loop the `BitConverter` calls only compare the first 8 Bytes (= one `Int64`) of the buffer. After some deliberation I settled for a second, inner loop that iterates over the byte arrays and individually compares every byte. This is reasonably fast, and it's especially much faster than the ultra-slow `compare-object` cmdlet. – herzbube Aug 22 '17 at 13:22
  • Unfortunately as herzbube notes, the current implementation gives completely wrong answers because only 8 bytes out of every 32768 are actually compared. – John Rees Jan 02 '19 at 10:12
  • Very interesting, is the version with the int64 problem solved? – matti157 Nov 20 '19 at 11:10
  • Added a buffer, as read by buffer is way faster. Updated the original blog article as well. – Kees C. Bakker Jun 08 '20 at 11:02
  • Apologies for necroing, but I think there’s a (theoretical, if not practical) bug in the use of ```Read``` - the docs say “An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.” (see https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream.read?view=net-7.0) so ```$fs1.Read(…)``` and ```$fs2.Read(…)``` *could* read different byte counts. It doesn’t seem to ever actually happen in practice, but it’s *possible*. An assert for, e.g. ```$bytesRead1 -eq $bytesRead2``` inside the loop would at least protect against this… – mclayton Aug 06 '23 at 17:19
  • 1
    @KeesC.Bakker - Nine years later, I've used your code as the basis for a function that I tested against other methods of binary file comparison in PowerShell, and found your method to be the fastest: https://stackoverflow.com/questions/76895989/speed-of-binary-file-comparisons-in-powershell/ – NewSites Aug 14 '23 at 03:13
2
if ( (Get-FileHash c:\testfiles\testfile1.txt).Hash -eq (Get-FileHash c:\testfiles\testfile2.txt).Hash ) {
   Write-Output "Files match"
} else {
   Write-Output "Files do not match"
}
4b0
  • 21,981
  • 30
  • 95
  • 142
  • 1
    Hi and welcome to stackoverflow, and thank you for answering. While this code might answer the question, can you consider adding some explanation for what the problem was you solved, and how you solved it? This will help future readers to understand your answer better and learn from it. – Plutian Jan 31 '20 at 09:03