9

I spotted an interesting statement in "PowerShell Notes for professionals" whitepaper - "In a pipeline series each function runs parallel to the others, like parallel threads":

enter image description here

Is that correct? if "yes", is there a technical documentation that supports this statement?

iPath ツ
  • 2,468
  • 20
  • 31
  • Since the "whitepaper" is [actually a rip-off](http://books.goalkicker.com/PowerShellBook/) from [sunsetted](https://meta.stackoverflow.com/questions/354217/sunsetting-documentation) SO's Documentation, I'd take its contents with a pinch of salt. – vonPryz Jan 30 '18 at 13:35
  • On a pragmatic level... I believe the buffer size for a pipe is 8k bytes. Given that you can pass multiple gigabytes through a pipe, it follows that the second process in the pipeline must be running at the same time as the first process to take its output from the pipe else your pipe would get blocked. – Mark Setchell Jan 30 '18 at 15:05
  • @MarkSetchell where's that 8k buffer size coming from? – iPath ツ Jan 30 '18 at 15:08
  • Does it really matter if it's 4k, 8k, 64k or 1MB? The principle is the same - the entire multiple gigabytes aren't buffered in memory so the second process must be reading as the first writes. – Mark Setchell Jan 30 '18 at 16:00

1 Answers1

11

It's kinda true, but not really at all.

What do I mean with that? First, let's get your documentation question out of the way. The following is from paragraph §3.13 of the PowerShell version 3.0 Language Specification:

If a command writes a single object, its successor receives that object and then terminates after writing its own object(s) to its successor. If, however, a command writes multiple objects, they are delivered one at a time to the successor command, which executes once per object. This behavior is called streaming. In stream processing, objects are written along the pipeline as soon as they become available, not when the entire collection has been produced.

When processing a collection, a command can be written such that it can do special processing before the initial element and after the final element.

Now, let's have a brief look at what a cmdlet consists of.


Cmdlets and their building blocks

It may be enticing to think of a cmdlet as just another function, a sequential set of statements to be executed synchronously whenever invoked. This is not correct, however.

A cmdlet, in PowerShell, is an object that implements one of at least 3 methods:

Once a pipeline starts executing, BeginProcessing() is called on every single cmdlet in the pipeline. In this sense, all cmdlets in the pipeline are running "in parallel" - but this design basically allows us to execute the pipeline with a single thread - so actual parallel processing involving multiple threads is not necessary to execute the pipeline as designed.

It's probably more accurate to point out that cmdlets execute concurrently in a pipeline.


Let's try it out!

Since the three methods above maps directly onto the begin, process and end blocks that we can define in an advanced function, it's easy to see the effect of this execution flow.

Let's try and feed 5 objects to a pipeline consisting of three cmdlets reporting their state with Write-Host and see what happens (see code below):

PS C:\> 1..5 |first |second |third |Out-Null

pipeline processing

Be aware that PowerShell supports external output buffering control via the -OutBuffer common parameter, and this will influence the execution flow as well:

pipeline buffering

Hope this made some sense!


Here's the code I wrote for the demonstration above.

The Write-Host output from the below function will change its colour based on which alias we use, so it's a little easier to distinguish in the shell.

function Test-Pipeline {
  param(
    [Parameter(ValueFromPipeline)]
    [psobject[]]$InputObject
  )

  begin {
    $WHSplat = @{
      ForegroundColor = switch($MyInvocation.InvocationName){
        'first' {
          'Green'
        }
        'second' {
          'Yellow'
        }
        'third' {
          'Red'
        }
      }
    }
    Write-Host "Begin $($MyInvocation.InvocationName)" @WHSplat
    $ObjectCount = 0
  }

  process {
    foreach($Object in $InputObject) {
      $ObjectCount += 1
      Write-Host "Processing object #$($ObjectCount) in $($MyInvocation.InvocationName)" @WHSplat
      Write-Output $Object
    }
  }

  end {
    Write-Host "End $($MyInvocation.InvocationName)" @WHSplat
  }

}

Set-Alias -Name first  -Value Test-Pipeline
Set-Alias -Name second -Value Test-Pipeline
Set-Alias -Name third  -Value Test-Pipeline
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • 1
    So the bottom line is that one thread manages the items on one particular pipeline, right? i.e. there is no multithreading involved in PS pipelining in general? – iPath ツ Jan 30 '18 at 17:37
  • 2
    @iPathツ Correctly understood, yes :-) – Mathias R. Jessen Jan 30 '18 at 17:39
  • 4
    In short, PowerShell cmdlets run *concurrently* (but not --necessarily-- in parallel). Of course, that was true of *everything* on computers up until the early 21st century, but only software engineers bother making the distinction ;-) https://softwareengineering.stackexchange.com/a/190725/29348 – Jaykul Jan 31 '18 at 20:11