2

Problem Statement: I am trying to copy 100 of files (each of them like more than a GB in size) from source to the destination directory, I am automating this by a power-shell script. While executing the script the copy operation is copying the files in sequence. Is there any way we can copy them in parallel to reduce some time as it is taking a lot of time to copy all the files & have a limitation of using any third-party software.

    $DATAFileDir="D:\TEST_FOLDER\DATAFILESFX\*"
    $LOGFileDir="D:\TEST_FOLDER\LOGFILESFX\*"
    $DestDataDir="D:\TEST_FOLDER\Data\"
    $DestLogDir="D:\TEST_FOLDER\Log\"

    #Copying the Primary file
    Copy-Item -Path $DATAFileDir -Destination $DestDataDir -Recurse -Force -Verbose
    #Copying the Audit File
    Copy-Item -Path $LOGFileDir -Destination $DestLogDir -Recurse -Force -Verbose

Any suggestion for it ?

sɐunıɔןɐqɐp
  • 3,332
  • 15
  • 36
  • 40
Jyoti Prakash Mallick
  • 2,119
  • 3
  • 21
  • 38
  • 1
    Check out this question: https://stackoverflow.com/questions/185575/powershell-equivalent-of-bash-ampersand-for-forking-running-background-proce – Juan Sep 02 '18 at 14:42
  • Otherwise a multithreading java program to do this witn an executor won't take more than 20 lines – Juan Sep 02 '18 at 14:43
  • 2
    This isn't a powershell solution, but robocopy has an option to use multiple threads. https://technet.microsoft.com/en-us/library/dd542631.aspx – Jim Janney Sep 02 '18 at 15:52

6 Answers6

1

You can start job individual process for every file you want to copy.

$Source = Get-ChildItem -Path C:\SourceFolder -Recurse | Select -ExpandProperty FullName
$Destination = 'C:\DestinationFolder'
foreach ($Item in @($Source)){
    #starting job for every item in source list
    Start-Job -ScriptBlock {
        param($Item,$Destination) #passing parameters for copy-item 
            #doing copy-item
            Copy-Item -Path $Item -Destination $Destination -Recurse  -Force
    } -ArgumentList $Item,$Destination #passing parameters for copy-item 
}
Kirill Pashkov
  • 3,118
  • 1
  • 15
  • 20
1

You should be able to achieve this quite easily with a powershell workflow. The throttlelimit will throttle how many files will be copied in parallel. Remove it to copy all files in parallel (probably not recommended for 100 files).

workflow copyfiles {

    param($files)

    foreach -parallel -throttlelimit 3 ($file in $files) {

        Copy-Item -Path $file -Destination 'C:\destination\' -Force -verbose
    }
}

$files = Get-ChildItem -Path C:\source -Recurse -File

copyfiles $files.FullName
Nas
  • 1,243
  • 6
  • 7
1

You can use robocopy with the /move and /mt:n parameters to do this. Fastest syntax:

function RoboMove ([string]$From, [string]$To, [int]$Threads = 8) {

    Invoke-Expression ("[void](robocopy /move /mt:$Threads /s /z /nfl /ndl /njh /njs /nc /ns /np '$From' '$To')")

    if (Test-Path $From) {
        Remove-Item $From
    }
}

The maximum parallelization that can be achieved would require you to know whether the volume is an SSD or HDD. Safe values are 8 for HDD and 128 for SSD.

SSD detection can be automated with the following snippet, though it will give some non-fatal errors, if you have RAID or some sort of storage spaces.

function DetectVolumeType ([string]$Path) {

    $DriveLetter = $Path[0]
    $IsSSD = $False

    foreach ($Drive in Get-PhysicalDisk) {

        if ((($Drive | Get-Disk | Get-Partition).DriveLetter -Contains $DriveLetter) -and ($Drive.MediaType -eq 'SSD')) {

            $IsSSD = $True
            break
        }
    }
    return $IsSSD
}

Documentation: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy

Umair Ahmed
  • 2,420
  • 1
  • 21
  • 40
0

This powershell script uses .NET Framework classes directly, and should perform faster, even for lots of files. Use throttlelimit to control how much parallelization you need.

param([String]$argSourceRootDir,[String]$argTargetRootDir)

workflow copyfiles {

    param($sourceRootDir, $targetRootDir)

    $sourcePaths = [System.IO.Directory]::GetFiles($sourceRootDir, "*.*", "AllDirectories")

    foreach -parallel -throttlelimit 8 ($sourcePath in $sourcePaths) {

        $targetPath = $sourcePath.Replace($sourceRootDir, $targetRootDir)
        $targetDir = $targetPath.Substring(0, $targetPath.Length - [System.IO.Path]::GetFileName($targetPath).Length - 1)
        if(-not (Test-Path $targetDir))
        {
            $x = [System.IO.Directory]::CreateDirectory($targetDir)
            $z = [Console]::WriteLine("new directory: $targetDir")
        }
        $z = [Console]::WriteLine("copy file: $sourcePath => $targetPath")
        $x = [System.IO.File]::Copy($sourcePath, $targetPath, "true")
    }
}

copyfiles $argSourceRootDir $argTargetRootDir

Just save this code as ParallelCopy.ps1 and run it like this:

. ParallelCopy.ps1 "C:\Temp\SourceDir" "C:\Temp\TargetDir"
sɐunıɔןɐqɐp
  • 3,332
  • 15
  • 36
  • 40
0

Or you can use start-threadjob. If you have ps5, you can get threadjob from the gallery. https://powershellgallery.com/packages/ThreadJob/2.0.0 Or foreach-object -parallel in ps 7 https://devblogs.microsoft.com/powershell/powershell-foreach-object-parallel-feature/

start-bitstransfer? https://learn.microsoft.com/en-us/powershell/module/bitstransfer/start-bitstransfer?view=win10-ps

start-bitstransfer z:\files\*.iso c:
js2010
  • 23,033
  • 6
  • 64
  • 66
0

If all the 100 files are getting published to single redshift table then, Redshift has the capability to load multiple files in parallel using a single copy command. Checkout the redshift documentation : https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data-files.html

khan
  • 1
  • Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline. – somebadhat May 02 '20 at 19:46