2

Though this might be duplicate, this is different.

I have written a script to fast delete folders containing large amounts of files and subfolders using PowerShell, now I will post how I achieved parallel processing, and set up some examples:

The function:

function Parallel-Delete {
    param(
    [Parameter(Valuefrompipeline=$true, Mandatory=$true, Position=0)] [array]$filelist,
    [Parameter(Valuefrompipeline=$true, Mandatory=$true, Position=1)] [int]$number
    )
    0..($filelist.count-1) | Where-Object {$_ % 16 -eq $number} | foreach {Remove-Item -Path $filelist[$_]}
}

Making a test folder and list contents:

$test=[string]"C:\test"+$(get-random)
md $test | out-null
0..10000 | % {ni "${test}\${_}.txt"}|out-null
[array]$filelist=(Get-Childitem -Path $test -File -Force).Fullname

Test1:

0..15 | foreach-object {Invoke-Command -ScriptBlock { Parallel-Delete $filelist $_}}
rd $test

I have confirmed the parallel processes are working, but the parallel processes use the same amount of resources as the single-thread process:

Test2(remake test folders before running new tests):

(Get-Childitem -Path $test -File -Force).Fullname | Foreach {Remove-Item -Path $_}
rd $test

And the deletion speed of 16 parallel processes isn't 16 times as fast as the one-thread process as expected, and here are results:

Test1:

PS C:\Windows\System32>Measure-Command {0..15 | foreach-object {Invoke-Command -ScriptBlock { Parallel-Delete $filelist $_}}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 36
Milliseconds      : 279
Ticks             : 362798015
TotalDays         : 0.000419905109953704
TotalHours        : 0.0100777226388889
TotalMinutes      : 0.604663358333333
TotalSeconds      : 36.2798015
TotalMilliseconds : 36279.8015

Test2:

PS C:\Windows\System32>Measure-Command {(Get-Childitem -Path $test -File -Force).Fullname | Foreach {Remove-Item -Path $_}}

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 25
Milliseconds      : 980
Ticks             : 259802514
TotalDays         : 0.000300697354166667
TotalHours        : 0.0072167365
TotalMinutes      : 0.43300419
TotalSeconds      : 25.9802514
TotalMilliseconds : 25980.2514

And I have tried this using start-job:

0..15 | foreach-object {Start-Job -ScriptBlock { Parallel-Delete $filelist $_}}

And I didn't get what I expected, it successfully started 16 jobs that literally do nothing, so I stopped them all, I have also noticed, the variable $filelist in the "command" isn't green... So I don't know if the function isn't recognized or the variables aren't passed...

And I have tried a method I found here: Powershell Start-Process to start Powershell session and pass local variables

With this:

$ScriptBlock = {
    function Parallel-Delete {...}
}
remake test folder...
$PowerShell=(Get-Process -Id $pid).path
0..15|%{Start-Process -FilePath $PowerShell -ArgumentList "-Command & {$ScriptBlock Parallel-Delete('$filelist $_')}"}

I have successfully started 16 black PowerShell console Windows, and all of them show this:

cmdlet Parallel-Delete at command pipeline position 1
Supply values for the following parameters:
number:

It means the number isn't passed, but this also means the $filelist is passed successfully(maybe), I have confirmed the start-process with scriptblock works well with one variable, but it failed to pass multiple variables.

I also know Invoke-Expression, though I haven't tried it yet. Currently I think the start-process method is more like what I wanted-if I can get it working.

So that's it, how can I run a custom function with multiple parameters in n parallel processes, and pass multiple variables to the processes, and make them run concurrently and seperately and indepently from each other, so that the execution speed of the parallel processes would be n times as fast as the speed of a single thread process that produces the same outcome?

Will anyone help me please? Any help would be appreciated. I say thanks in advance.

P.S. I use PowerShell 7.1 x64 on Windows 10 20H2.

Update: I have tried foreach -parallel with this:

function Parallel-Delete {
        param(
        [Parameter(Valuefrompipeline=$true, Mandatory=$true, Position=0)] [array]$filelist,
        [Parameter(Valuefrompipeline=$true, Mandatory=$true, Position=1)] [int]$number
        )
        0..($filelist.count-1) | Where-Object {$_ % 16 -eq $number} | foreach {Remove-Item -Path $filelist[$_]}
}
[array]$filelist=(Get-Childitem -Path "C:\test\0" -File -Force).Fullname
0..15|foreach-object -Parallel {
Parallel-Delete $filelist $_
} -ThrottleLimit 16

And it gives me this error message 16 times:

Parallel-Delete:
Line |
   2 |  Parallel-Delete $filelist $_
     |  ~~~~~~~~~~~~~~~
     | The term 'Parallel-Delete' is not recognized as a name of a cmdlet, function, script file, or executable program.
Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

So the function is not parsed, and the variables aren't green...

And now I have tried:

0..15|foreach-object -Parallel {
-begin {Parallel-Delete {$filelist $_}}
} -ThrottleLimit 16

And it just gives me this error:

ParserError:
Line |
   2 |  -begin {Parallel-Delete {$filelist $_}}
     |                                     ~~
     | Unexpected token '$_' in expression or statement.
     

Please help...

  • 1
    TLTR; I scanned for [`Foreach-Object`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object)` `-Parallel`. Did you try that? – iRon Dec 16 '20 at 09:09
  • @iRon, Just tried it and it didn't work, it gives me error mesages, see my update for more information. –  Dec 16 '20 at 09:28
  • Is there any need to use PowerShell for multithreading instead of using a tool which does it internally? For example using robocopy as mentioned [here](https://superuser.com/a/1037179/1013929). – Dabombber Dec 16 '20 at 09:40
  • That error means that you have to include the function itself in the scope of the `foreach` loop, or in the `-begin` block: `-begin { function Parallel-Delete { ... } }` – iRon Dec 16 '20 at 10:48
  • 1
    BTW, how do you trouble shoot? e.g. how do you define "*nstead they use the same amount of time...*"? Try to set up a [mcve] (e.g. which `measure-command` that *correctly* proves your statements. – iRon Dec 16 '20 at 10:54

1 Answers1

0

For troubleshooting it is always good to exclude items that might interfere with your expectation. To come closer to your answer it often a good practice to cut your issue in half. In your specific case, ask yourself what you want to prove:

  1. PowerShell commands that run parallel should process faster

  2. My file system appears not handle asynchrone commands as expected

With all respect, the responders at this community (including myself) are more interested in the first topic than the second. In fact, if it concerns 2. the file system, there are a lot of other items (type of file system type, hardware, disk caching, etc.) involved and your topic belongs more to the super user community.
In other words, let's separate topic 1 from the file system by just processing Start-Sleep 1:

function Parallel-Delete { Start-Sleep 1 }

(Measure-Command { 0..15 | foreach-object { Parallel-Delete } }).TotalMilliseconds
16012.0359

(Measure-Command { 0..15 | foreach-object { Start-Job { Parallel-Delete } } }).TotalMilliseconds
4865.6182

(Measure-Command { 0..15 | foreach-object -Parallel { function Parallel-Delete { Start-Sleep 1 }; Parallel-Delete } }).TotalMilliseconds
4070.8242

Which is about what might expect from the performance differences of parallel processes on a 4 core system.

iRon
  • 20,463
  • 10
  • 53
  • 79