6

PowerShell 7 introduced a much needed feature for running pipeline input in parallel.

The documentation for PowerShell 7 does not provide any detail on how this is implemented.

Having leveraged PoshRSJob and Invoke-Parallel modules before, I'm aware that runspaces were traditionally considered the much more efficient approach for parallel operations in powershell over running PowerShell jobs. I've read some mixed content indicating that this is using threading now and not runspaces, but can't find anything else specific.

I'd really appreciate some technical insight into:

  1. What is the lifecycle of an execution from a .NET perspective
  2. Is the new functionality runspaces or threads? (or is a runspace just a .NET thread in System.Management.Automation?)
  3. Does this bring about any complexity in traditional debugging now that we are moving into parallel operations? Historically I had a rough time debugging with runspaces, and not sure what options might have been improved
sheldonhull
  • 1,807
  • 2
  • 26
  • 44
  • 1
    according to the articles i have seen, it uses runspaces. you need to send $vars into it [usually with `$Using:`]. it loads all the needed modules and functions and whatnot into each runspace, so it takes time to set up & tear down. i have not seen anything about debugging so far. – Lee_Dailey Mar 24 '20 at 19:22
  • 4
    Separate runspaces, managed via a new internal API (`PSTaskPool`). [The RFC](https://github.com/PowerShell/PowerShell-RFC/blob/master/4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md) goes into _some_ detail about implementation and constraints. [The source code](https://github.com/PowerShell/PowerShell/blob/b7cb335f03fe2992d0cbd61699de9d9aafa1d7c1/src/System.Management.Automation/engine/InternalCommands.cs#L384) also contains a number of helpful comments – Mathias R. Jessen Mar 24 '20 at 19:37
  • 1
    Just in case (who knows) there is also [SplitPipeline](https://github.com/nightroman/SplitPipeline) with some unique features (IMHO, indeed), e.g. it works well with very large or infinite input. – Roman Kuzmin Mar 24 '20 at 20:02
  • 2
    There's also start-threadjob, which I believe was written by the same guy. It doesn't serialize objects like start-job. – js2010 Mar 24 '20 at 20:14
  • 1
    RFC was pointed out to me with some great info too. Will review and post an answer here if it answers some of this later. [RFC0044-ForEach-Parallel-Cmdlet](https://github.com/PowerShell/PowerShell-RFC/blob/5ee8f185c7c8549a7f458485b071992a91e9f550/4-Experimental-Accepted/RFC0044-ForEach-Parallel-Cmdlet.md) – sheldonhull Mar 24 '20 at 22:45

2 Answers2

4

Debugging foreach-object -parallel:

I need a second pwsh process to do it. In the first one do:

foreach-object -parallel { Wait-Debugger;1;2;3 }

Then in the second window, figure out what the pid of the other pwsh is. Then enter that pshostprocess. Look at the runspaces, and debug the one whose availability says "InBreakpoint". "v" means "step over".

get-process pwsh

 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
     64    44.32      82.23       1.70    3912  12 pwsh
     63    40.66      78.03       1.36    6472  12 pwsh

$pid
6472

Enter-PSHostProcess 3912

get-runspace

 Id Name            ComputerName    Type          State         Availability
 -- ----            ------------    ----          -----         ------------
  1 Runspace1       localhost       Local         Opened        Busy
  2 PSTask:1        localhost       Local         Opened        InBreakpoint
  3 RemoteHost      localhost       Local         Opened        Busy

debug-runspace 2
v
v
v

If you run foreach-object -parallel -asjob, you can use get-runspace and debug-runspace in the same window. But you couldn't see the output when stepping.

foreach-object -parallel { Wait-Debugger;1;2;3 } -asjob
get-runspace

 Id Name            ComputerName    Type          State         Availability
 -- ----            ------------    ----          -----         ------------
  1 Runspace1       localhost       Local         Opened        Available
  2 PSTask:1        localhost       Local         Opened        InBreakpoint

debug-runspace 2
v
v
v

Here's a new debugging video that has some advanced setups with Vscode: https://www.reddit.com/r/PowerShell/comments/gn0270/advanced_powershell_debugging_techniques/

js2010
  • 23,033
  • 6
  • 64
  • 66
2

Found this fantastic blog post PowerShell ForEach-Object Parallel Feature by Paul Higinbotham.

From this blog post the key highlights I took away:

Script blocks run in a context called a PowerShell runspace. The runspace context contains all of the defined variables, functions and loaded modules.

As previously mentioned, the new ForEach-Object -Parallel feature uses existing PowerShell functionality to run script blocks concurrently....PowerShell itself imposes conditions on how scripts run concurrently, based on its design and history. Scripts have to run in runspace contexts and only one script thread can run at a time within a runspace. So in order to run multiple scripts simultaneously multiple runspaces must be created.

So it confirms runspaces are the main driver for this and provides some further information on threadsafe operations and more. Any prior answers or detail provided on runspaces are relevant here as this is a matured implementation of runspaces for parallel operations in the official standard library. Other implementations have been done by the community that are runspace oriented, but this is now included with no external module dependencies.

Thanks Paul for such a good contribution to the community!

Community
  • 1
  • 1
sheldonhull
  • 1,807
  • 2
  • 26
  • 44
  • I don't think you can. Runspaces are threaded so not sure you can break into seperate runspaces with debugger. I've never had success trying that so let me know if you find out otherwise. Sounds like maybe a good seperate question – sheldonhull May 14 '20 at 19:32
  • there is a debug-runspace command. But you might have to break in from another process unless the loop is somehow running in the background. – js2010 May 14 '20 at 20:04