30

I am trying to recurse through a directory and copy it from A to B. That can be done with the following:

Copy-Item C:\MyTest C:\MyTest2 –recurse

I want to be able though to only copy new files (ones that exist in src but not dest) and also only copy files that may have changed based off a CRC check and not a datetime stamp.

$file = "c:\scripts"
param
(
$file
)

$algo = [System.Security.Cryptography.HashAlgorithm]::Create("MD5")
$stream = New-Object System.IO.FileStream($file, [System.IO.FileMode]::Open)

$md5StringBuilder = New-Object System.Text.StringBuilder
$algo.ComputeHash($stream) | `
% { [void] $md5StringBuilder.Append($_.ToString("x2")) }
$md5StringBuilder.ToString()

$stream.Dispose() 

This code gives me a CRC check on a specific file...I am just not sure how to put the two scripts together to really give me what I need. I also don't know if the CRC check above is actually the correct way of doing this.

Does anyone have any insight?

  • 7
    My first question would be have you looked at just using Robocopy? You are really reinventing a very well designed wheel here. – EBGreen Mar 24 '09 at 15:12

4 Answers4

40

Both of those are solid answers for powershell, but it would probably be far more easy to just leverage Robocopy (MS Supplied robust copy application).

robocopy "C:\SourceDir\" "C:\DestDir\" /MIR

would accomplish the same thing.

Bobby B
  • 2,372
  • 3
  • 24
  • 31
  • 10
    robocopy doesn't compare content as far as I can tell. It relies on sizes and date stamps. – bart Oct 30 '13 at 20:42
  • 4
    I'm surprised that nobody has commented on the fact that this does *not* do what the OP asked for. This will copy changed files, but it will also delete files in the destination that are not in the source. That is quite a dangerous side-effect. /E should be enough. – Scott Gartner May 23 '19 at 23:42
  • I'm wrong, /E will skip existing files, but apparently won't look to see if they have changed. – Scott Gartner May 23 '19 at 23:48
  • 2
    robocopy will do mirroring which means that it will delete all file that doesn't exists in the source folder, this is different from what the original question was – Sufyan Jabr Aug 08 '19 at 06:34
10

Here is some of the guidelines how you can your script to be more maintainable.

Conver the original script as a filter.

filter HasChanged { 
    param($file)

    # if $file's MD5 has does not exist
    # then return $_
}

Then simply filter all files that are updated and copy them.

# Note that "Copy-Item" here does not preserve original directory structure
# Every updated file gets copied right under "C:\MyTest2"
ls C:\MyTest -Recurse | HasChanged | Copy-Item -Path {$_} C:\MyTest2

Or you can create another function that generates sub directory

ls C:\MyTest -Recurse | HasChanged | % { Copy-Item $_ GenerateSubDirectory(...) }
dance2die
  • 35,807
  • 39
  • 131
  • 194
6

I found a solution...but not sure it is the best from a performance perspective:

$Source = "c:\scripts"
$Destination = "c:\test"
###################################################
###################################################
Param($Source,$Destination)
function Get-FileMD5 {
    Param([string]$file)
    $mode = [System.IO.FileMode]("open")
    $access = [System.IO.FileAccess]("Read")
    $md5 = New-Object System.Security.Cryptography.MD5CryptoServiceProvider
    $fs = New-Object System.IO.FileStream($file,$mode,$access)
    $Hash = $md5.ComputeHash($fs)
    $fs.Close()
    [string]$Hash = $Hash
    Return $Hash
}
function Copy-LatestFile{
    Param($File1,$File2,[switch]$whatif)
    $File1Date = get-Item $File1 | foreach-Object{$_.LastWriteTimeUTC}
    $File2Date = get-Item $File2 | foreach-Object{$_.LastWriteTimeUTC}
    if($File1Date -gt $File2Date)
    {
        Write-Host "$File1 is Newer... Copying..."
        if($whatif){Copy-Item -path $File1 -dest $File2 -force -whatif}
        else{Copy-Item -path $File1 -dest $File2 -force}
    }
    else
    {
        #Don't want to copy this in my case..but good to know
        #Write-Host "$File2 is Newer... Copying..."
        #if($whatif){Copy-Item -path $File2 -dest $File1 -force -whatif}
        #else{Copy-Item -path $File2 -dest $File1 -force}
    }
    Write-Host
}

# Getting Files/Folders from Source and Destination
$SrcEntries = Get-ChildItem $Source -Recurse
$DesEntries = Get-ChildItem $Destination -Recurse

# Parsing the folders and Files from Collections
$Srcfolders = $SrcEntries | Where-Object{$_.PSIsContainer}
$SrcFiles = $SrcEntries | Where-Object{!$_.PSIsContainer}
$Desfolders = $DesEntries | Where-Object{$_.PSIsContainer}
$DesFiles = $DesEntries | Where-Object{!$_.PSIsContainer}

# Checking for Folders that are in Source, but not in Destination
foreach($folder in $Srcfolders)
{
    $SrcFolderPath = $source -replace "\\","\\" -replace "\:","\:"
    $DesFolder = $folder.Fullname -replace $SrcFolderPath,$Destination
    if(!(test-path $DesFolder))
    {
        Write-Host "Folder $DesFolder Missing. Creating it!"
        new-Item $DesFolder -type Directory | out-Null
    }
}

# Checking for Folders that are in Destinatino, but not in Source
foreach($folder in $Desfolders)
{
    $DesFilePath = $Destination -replace "\\","\\" -replace "\:","\:"
    $SrcFolder = $folder.Fullname -replace $DesFilePath,$Source
    if(!(test-path $SrcFolder))
    {
        Write-Host "Folder $SrcFolder Missing. Creating it!"
        new-Item $SrcFolder -type Directory | out-Null
    }
}

# Checking for Files that are in the Source, but not in Destination
foreach($entry in $SrcFiles)
{
    $SrcFullname = $entry.fullname
    $SrcName = $entry.Name
    $SrcFilePath = $Source -replace "\\","\\" -replace "\:","\:"
    $DesFile = $SrcFullname -replace $SrcFilePath,$Destination
    if(test-Path $Desfile)
    {
        $SrcMD5 = Get-FileMD5 $SrcFullname
        $DesMD5 = Get-FileMD5 $DesFile
        If(Compare-Object $srcMD5 $desMD5)
        {
            Write-Host "The Files MD5's are Different... Checking Write
            Dates"
            Write-Host $SrcMD5
            Write-Host $DesMD5
            Copy-LatestFile $SrcFullname $DesFile
        }
    }
    else
    {
        Write-Host "$Desfile Missing... Copying from $SrcFullname"
        copy-Item -path $SrcFullName -dest $DesFile -force
    }
}

# Checking for Files that are in the Destinatino, but not in Source
foreach($entry in $DesFiles)
{
    $DesFullname = $entry.fullname
    $DesName = $entry.Name
    $DesFilePath = $Destination -replace "\\","\\" -replace "\:","\:"
    $SrcFile = $DesFullname -replace $DesFilePath,$Source
    if(!(test-Path $SrcFile))
    {
        Write-Host "$SrcFile Missing... Copying from $DesFullname"
        copy-Item -path $DesFullname -dest $SrcFile -force
    }
}
Abdul Munim
  • 18,869
  • 8
  • 52
  • 61
1

It is a bit long in the tooth but it does the job admirably - could be extended to look for archive bit ---a--- attribute of the file. Anyway might be a reasonable starting point for someone.

Function GetFileSHA ($file) {
    return [Array](Get-FileHash $file -Algorithm SHA256);
}
$SourceDir = "C:\temp\1";
$TargetDir = "C:\temp\2";
$SourceFiles = Get-ChildItem -Recurse $SourceDir;
$TargetFiles = Get-ChildItem -Recurse $TargetDir;

#Source Table
$dt = New-Object System.Data.DataTable;
$dt.TableName = "SrcFiles";
$dtcol1 = New-Object system.Data.DataColumn fileId,([System.Int32]); $dt.columns.add($dtcol1);
$dtcol1.AllowDBNull = $false;
$dtcol1.AutoIncrement = $true;
$dtcol1.AutoIncrementSeed = 0;
$dtcol1.Unique = $true;
$dt.PrimaryKey = $dtcol1;
$dtcol2 = New-Object system.Data.DataColumn fileName,([string]); $dt.columns.add($dtcol2);
$dtcol3 = New-Object system.Data.DataColumn filePath,([string]); $dt.columns.add($dtcol3);
$dtcol3.Unique = $true;
$dtcol4 = New-Object system.Data.DataColumn fileHash,([string]); $dt.columns.add($dtcol4);
$dtcol5 = New-Object system.Data.DataColumn fileDate,([System.DateTime]); $dt.columns.add($dtcol5);

#Target Table
$dt2 = New-Object System.Data.DataTable;
$dt2.TableName = "TrgFiles";
$dt2col1 = New-Object system.Data.DataColumn fileId,([System.Int32]); $dt2.columns.add($dt2col1);
$dt2col1.AllowDBNull = $false;
$dt2col1.AutoIncrement = $true;
$dt2col1.AutoIncrementSeed = 0;
$dt2col1.Unique = $true;
$dt2.PrimaryKey = $dt2col1;
$dt2col2 = New-Object system.Data.DataColumn fileName,([string]); $dt2.columns.add($dt2col2);
$dt2col3 = New-Object system.Data.DataColumn filePath,([string]); $dt2.columns.add($dt2col3);
$dt2col3.Unique = $true;
$dt2col4 = New-Object system.Data.DataColumn fileHash,([string]); $dt2.columns.add($dt2col4);
$dt2col5 = New-Object system.Data.DataColumn fileDate,([System.DateTime]); $dt2.columns.add($dt2col5);

#Store file hashes and other attributes into DataTable for comparison
ForEach ($src_file in $SourceFiles){
    $this_file = GetFileSHA $src_file.FullName;
    $row = $dt.NewRow();
    $row.fileName=($src_file).PSChildName;
    $row.filePath=($src_file).FullName;
    $row.fileHash=($this_file).Hash;
    $row.fileDate=($src_file).LastWriteTimeUtc;
    $dt.Rows.Add($row); 
}
ForEach ($trg_file in $TargetFiles){
    $this_file = GetFileSHA $trg_file.FullName;
    $row = $dt2.NewRow();
    $row.fileName=($trg_file).PSChildName;
    $row.filePath=($trg_file).FullName;
    $row.fileHash=($this_file).Hash;
    $row.fileDate=($trg_file).LastWriteTimeUtc;
    $dt2.Rows.Add($row);    
}

#Compare and copy if newer/changed
ForEach ($file in $dt){
    $search_dt2 = ($dt2 | Select-Object "fileName", "filePath", "fileDate", "fileHash" | Where-Object {$_.fileName -eq $file.fileName})
    if ($file.fileHash -eq $search_dt2.fileHash){
        $result=1;
        Write-Host "File Hashes are a match - checking LastWrite status just to be safe...";
    } else {
        $result=2;
        Write-Host "File Hashes are not a match - checking LastWrite status to see if Target is newer than source...";
    }
    if ($result -eq 1 -and ($file.fileDate -eq $search_dt2.fileDate)){
        $result=1;
        Write-Host "LastWrite status is a match. File will be skipped from copy.";
    } elseif ($result -eq 1 -and $search_dt2.fileDate -gt $file.fileDate) {
        $result=1;
        Write-Host "LastWrite status of the target is newer. File will be skipped from copy.";
    } elseif ($result -ne 1 -and $search_dt2.fileDate -gt $file.fileDate) {
        $result=1;
        Write-Host "LastWrite status of the target is newer than the source. File will be skipped from copy.";
    } else {
        $result=3;
        Write-Host "File in target is older than the source and is scheduled for copy...";
    }
    if (Test-Path $search_dt2.filePath){
    } else {
        $result=4;
        Write-Host "File does not exist in the target folder - file is scheduled for copy...";
    }
    
    #DO the action based on above logic
    if($result -ne 1){
        Copy-Item -Path $file.filePath -Destination $search_dt2.filePath -Force -Verbose
        Write-Host "Code:[$result]";        
    }   else {
        Write-Host "Code:[$result]";
    }
}