To add to Santiago Squarzon's helpful answer:
Below is helper function Measure-Parallel
, which allows you to compare the speed of the following approaches to parallelism:
Note:
Given that the tests below wrap a single call to an external executable (such as 7z.exe
in your case), the Start-Process
approach will perform best, because it doesn't have the overhead of job management. However, as noted above, this approach has fundamental limitations.
Due to its complexity, the runspace-pool-based approach from Santiago's answer wasn't included; if Start-ThreadJob
or ForEach-Object -Parallel
are available to you, you won't need to resort to this approach.
Sample Measure-Parallelism
call, which contrast the runtime performance of the approaches:
# Run 20 jobs / processes in parallel, 5 at a time, comparing
# all approaches.
# Note: Omit the -Approach argument to enter interactive mode.
Measure-Parallel -Approach All -BatchSize 5 -JobCount 20
Sample output from a macOS machine running PowerShell 7.2.6 (timings vary based on many factors, but the ratios should provide a sense of relative performance):
# ... output from the jobs
JobCount : 20
BatchSize : 5
BatchCount : 4
Start-Job (secs.) : 2.20
Start-ThreadJob (secs.) : 1.17
Start-Process (secs.) : 0.84
ForEach-Object -Parallel (secs.) : 0.94
Conclusions:
ForEach-Object -Parallel
adds the least thread/job-management overhead, followed by Start-ThreadJob
Start-Job
, due to needing an extra child process - for the hidden PowerShell instance running each task - is noticeably slower. It seems that on Windows the performance discrepancy is much more pronounced.
Measure-Parallel
source code:
Important:
The function hard-codes sample input objects as well as what external program to invoke - you'll have to edit it yourself as needed; the hard-coded external program is the platform-native shell in this case (cmd.exe
on Windows, /bin/sh
on Unix-like platform), which is passed a command to simply echo each input object.
- It wouldn't be too hard to modify the function to accept a script block as an argument, and to receive input objects for the jobs via the pipeline (though that would preclude the
Start-Process
approach, except if you explicitly call the block via the PowerShell CLI - but in that case Start-Job
could just be used).
What the jobs / processes output goes directly to the display and cannot be captured.
The batch size, which defaults to 5
, can be modified with -BatchSize
; for the thread-based approaches, the batch size is also used as the -ThrottleLimit
argument, i.e. the limit on how many threads are allowed to run at the same time. By default, a single batch is run, but you may request multiple batches indirectly by passing the total number of parallel runs to the -JobCount
You can select approaches via the array-valued -Approach
parameter, which supports Job
, ThreadJob
, Process
, ForEachParallel
, and All
, which combines all of the preceding.
- If
-Approach
isn't specified, interactive mode is entered, where you're (repeatedly) prompted for the desired approach.
Except in interactive mode, a custom object with comparative timings is output.
function Measure-Parallel {
[CmdletBinding()]
param(
[ValidateRange(2, 2147483647)] [int] $BatchSize = 5,
[ValidateSet('Job', 'ThreadJob', 'Process', 'ForEachParallel', 'All')] [string[]] $Approach,
[ValidateRange(2, 2147483647)] [int] $JobCount = $BatchSize # pass a higher count to run multiple batches
)
$noForEachParallel = $PSVersionTable.PSVersion.Major -lt 7
$noStartThreadJob = -not (Get-Command -ErrorAction Ignore Start-ThreadJob)
$interactive = -not $Approach
if (-not $interactive) {
# Translate the approach arguments into their corresponding hashtable keys (see below).
if ('All' -eq $Approach) { $Approach = 'Job', 'ThreadJob', 'Process', 'ForEachParallel' }
$approaches = $Approach.ForEach({
if ($_ -eq 'ForEachParallel') { 'ForEach-Object -Parallel' }
else { $_ -replace '^', 'Start-' }
})
}
if ($noStartThreadJob) {
if ($interactive -or $approaches -contains 'Start-ThreadJob') {
Write-Warning "Start-ThreadJob is not installed, omitting its test; install it with ``Install-Module ThreadJob``"
$approaches = $approaches.Where({ $_ -ne 'Start-ThreadJob' })
}
}
if ($noForEachParallel) {
if ($interactive -or $approaches -contains 'ForEach-Object -Parallel') {
Write-Warning "ForEach-Object -Parallel is not available in this PowerShell version (requires v7+), omitting its test."
$approaches = $approaches.Where({ $_ -ne 'ForEach-Object -Parallel' })
}
}
# Simulated input: Create 'f0.zip', 'f1'.zip', ... file names.
$zipFiles = 0..($JobCount - 1) -replace '^', 'f' -replace '$', '.zip'
# Sample executables to run - here, the native shell is called to simply
# echo the argument given.
# The external program to invoke.
$exe = if ($env:OS -eq 'Windows_NT') { 'cmd.exe' } else { 'sh' }
# The list of its arguments *as a single string* - use '{0}' as the placeholder for where the input object should go.
$exeArgList = if ($env:OS -eq 'Windows_NT') { '/c "echo {0}"' } else { '-c "echo {0}"' }
# A hashtable with script blocks that implement the 3 approaches to parallelism.
$approachImpl = [ordered] @{}
$approachImpl['Start-Job'] = { # child-process-based job
param([array] $batch)
$batch |
ForEach-Object {
Start-Job { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob # wait for all jobs, relay their output, then remove them.
}
if (-not $noStartThreadJob) {
# If Start-ThreadJob is available, add an approach for it.
$approachImpl['Start-ThreadJob'] = { # thread-based job - requires Install-Module ThreadJob in WinPS
param([array] $batch)
$batch |
ForEach-Object {
Start-ThreadJob -ThrottleLimit $BatchSize { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob
}
}
if (-not $noForEachParallel) {
# If ForEach-Object -Parallel is supported (v7+), add an approach for it.
$approachImpl['ForEach-Object -Parallel'] = {
param([array] $batch)
$batch | ForEach-Object -ThrottleLimit $BatchSize -Parallel {
Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $_))
}
}
}
$approachImpl['Start-Process'] = { # direct execution of an external program
param([array] $batch)
$batch |
ForEach-Object {
Start-Process -NoNewWindow -PassThru $exe -ArgumentList ($exeArgList -f $_)
} |
Wait-Process # wait for all processes to terminate.
}
# Partition the array of all indices into subarrays (batches)
$batches = @(
0..([math]::Ceiling($zipFiles.Count / $batchSize) - 1) | ForEach-Object {
, $zipFiles[($_ * $batchSize)..($_ * $batchSize + $batchSize - 1)]
}
)
# In interactive use, print verbose messages by default
if ($interactive) { $VerbosePreference = 'Continue' }
:menu while ($true) {
if ($interactive) {
# Prompt for the approach to use.
$choices = $approachImpl.Keys.ForEach({
if ($_ -eq 'ForEach-Object -Parallel') { '&' + $_ }
else { $_ -replace '-', '-&' }
}) + '&Quit'
$choice = $host.ui.PromptForChoice("Approach", "Select parallelism approach:", $choices, 0)
if ($choice -eq $approachImpl.Count) { break }
$approachKey = @($approachImpl.Keys)[$choice]
}
else {
# Use the given approach(es)
$approachKey = $approaches
}
$tsTotals = foreach ($appr in $approachKey) {
$i = 0; $tsTotal = [timespan] 0
$batches | ForEach-Object {
$ts = Measure-Command { & $approachImpl[$appr] $_ | Out-Host }
Write-Verbose "$batchSize-element '$appr' batch finished in $($ts.TotalSeconds.ToString('N2')) secs."
$tsTotal += $ts
if (++$i -eq $batches.Count) {
# last batch processed.
if ($batches.Count -gt 1) {
Write-Verbose "'$appr' processing of $JobCount items overall finished in $($tsTotal.TotalSeconds.ToString('N2')) secs."
}
$tsTotal # output the overall timing for this approach
}
elseif ($interactive) {
$choice = $host.ui.PromptForChoice("Continue?", "Select action", ('&Next batch', '&Return to Menu', '&Quit'), 0)
if ($choice -eq 1) { continue menu }
if ($choice -eq 2) { break menu }
}
}
}
if (-not $interactive) {
# Output a result object with the overall timings.
$oht = [ordered] @{}; $i = 0
$oht['JobCount'] = $JobCount
$oht['BatchSize'] = $BatchSize
$oht['BatchCount'] = $batches.Count
foreach ($appr in $approachKey) {
$oht[($appr + ' (secs.)')] = $tsTotals[$i++].TotalSeconds.ToString('N2')
}
[pscustomobject] $oht
break # break out of the infinite :menu loop
}
}
}