2

I was reading about how pipeline works in PowerShell at about_Pipelines, and got to know that pipeline delivers one object at a time.

So, this

Get-Service | Format-Table -Property Name, DependentServices

Is different from this

Format-Table -InputObject (Get-Service) -Property Name, DependentServices

So here, going by the explanation, in the first case, the Format-Table works on one object at at time and in the second example, Format-Table works on an array of objects. Please correct me if I am wrong.

If this is the case, then I wonder how does Sort-Object and other cmdlets that need to work on collections of data work with pipe character.

When I do :

Get-Service | Sort-Object

How is Sort-Object able to sort if it just gets to work with one object at a time. So, assume there are 100 service objects that are to be passed to Sort-Object. Will Sort-Object be called 100 times (each for one object) ? And, How will that yield in Sorted results that I see on the screen.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
pragun
  • 471
  • 1
  • 6
  • 9

2 Answers2

7

Sort-Object (and other cmdlets that need to evaluate all input objects before outputting anything) work by collecting the input objects one by one, and then not doing any actual work until the upstream cmdlet (Get-Service in this case) is done sending input.

How does this work? Well, let's try and recreate Sort-Object with a PowerShell function.

To do so, we first need to understand that a cmdlet consists of 3 separate routines:

  • Begin - the Begin routines of each cmdlet in a pipeline are invoked once before anything else occurs
  • Process - this routine is invoked on each cmdlet every time input is received from an upstream command
  • End - this is invoked once the upstream command has called End and there are no more input items for Process to process

(These are the block label names used in PowerShell function definitions - in a binary cmdlet you'd override the implementation of BeginProcessing, ProcessRecord, EndProcessing methods of your cmdlet)

So, to "collect" every input item, we need to add some logic to the Process block of our command, and then we can put the code that operates on all the items in the End block:

function Sort-ObjectCustom
{
  param(
    [Parameter(Mandatory, ValueFromPipeline)]
    [object[]]$InputObject
  )

  begin {
    # Let's use the `begin` block to create a list that'll hold all the input items
    $list = [System.Collections.Generic.List[object]]::new()

    Write-Verbose "Begin was called"
  }

  process {
    # Here we simply collect all input to our list
    $list.AddRange($InputObject)

    Write-Verbose "Process was called [InputObject: $InputObject]"
  }

  end {
    # The `end` block is only ever called _after_ we've collected all input
    # Now we can safely sort it
    $list.Sort()

    Write-Verbose "End was called"

    # and output the results
    return $list
  }
}

If we invoke our new command with -Verbose, we will see how the input is collected one by one:

PS ~> 10..1 |Sort-ObjectCustom -Verbose
VERBOSE: Begin was called
VERBOSE: Process was called [InputObject: 10]
VERBOSE: Process was called [InputObject: 9]
VERBOSE: Process was called [InputObject: 8]
VERBOSE: Process was called [InputObject: 7]
VERBOSE: Process was called [InputObject: 6]
VERBOSE: Process was called [InputObject: 5]
VERBOSE: Process was called [InputObject: 4]
VERBOSE: Process was called [InputObject: 3]
VERBOSE: Process was called [InputObject: 2]
VERBOSE: Process was called [InputObject: 1]
VERBOSE: End was called
1
2
3
4
5
6
7
8
9
10

For more information on how to implement pipeline input processing routines for binary cmdlets, see the "How to Override Input Processing".

For more information on how to take advantage of the same pipeline semantics in functions, see the about_Functions_Advanced_Methods and related help topics

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Thank you. Nicely explained. That clears many doubts. If I may ask, can you also explain how, whatever is written on left side of pipe character is handled? So, when microsoft says that pipe delivers object one by one, does it mean that Get-Service would be invoked many times while maintaining some state and it would emit object one by one for Sort-Object's process to capture or does it mean that Get-Service would be invoked once, and the collection would be kept somewhere in buffer and pipe would make sure to deliver the object one at a time? – pragun Jul 04 '21 at 12:23
  • Well, `Get-Service` will _also_ do most of it's work in the `End` block, and once the `End` block returns, the pipeline processor (the "PowerShell engine" responsible for execution of the pipeline) will know that `Get-Service` is "done". Does that make sense? – Mathias R. Jessen Jul 04 '21 at 12:32
  • @pragun It's important to understand that any of the 3 routines can output 1 object, multiple objects, or none at all - completely up to the author of the command. There's no hard-and-fast rule that all output has to be sent in either `process` or `end` for example – Mathias R. Jessen Jul 04 '21 at 12:42
  • I am sorry for the late reply. Yes, that helps. Thank you! – pragun Jul 11 '21 at 05:02
1

To complement the answer from Mathias, you can actually visualize the order of the process from an existing cmdlet using the Write-Host cmdlet which immediately writes the output to the display (rather than the pipeline):

$Data = ConvertFrom-Csv @'
Id, Name
 4, Four
 2, Two
 3, Three
 1, One
'@

Select-Object example

$Data |
    Foreach-Object { Write-Host 'in:' ($_ |ConvertTo-Json -Compress); $_ } |
    Select-Object * |
    Foreach-Object { Write-Host 'out:' ($_ |ConvertTo-Json -Compress); $_ }

Shows:

in: {"Id":"4","Name":"Four"}
out: {"Id":"4","Name":"Four"}

in: {"Id":"2","Name":"Two"}
out: {"Id":"2","Name":"Two"}
in: {"Id":"3","Name":"Three"}
out: {"Id":"3","Name":"Three"}
in: {"Id":"1","Name":"One"}
out: {"Id":"1","Name":"One"}
Id Name
-- ----
4  Four
2  Two
3  Three
1  One

Sort-Object example

$Data |
    Foreach-Object { Write-Host 'in:' ($_ |ConvertTo-Json -Compress); $_ } |
    Sort-Object * |
    Foreach-Object { Write-Host 'out:' ($_ |ConvertTo-Json -Compress); $_ }

Shows:

in: {"Id":"4","Name":"Four"}
in: {"Id":"2","Name":"Two"}
in: {"Id":"3","Name":"Three"}
in: {"Id":"1","Name":"One"}
out: {"Id":"1","Name":"One"}

out: {"Id":"2","Name":"Two"}
out: {"Id":"3","Name":"Three"}
out: {"Id":"4","Name":"Four"}
Id Name
-- ----
1  One
2  Two
3  Three
4  Four

In general, PowerShell cmdlets Write Single Records to the Pipeline where it is possible (one of the advantages of this encouraged guideline is that it reduces memory consumption). As implied by your question, Sort-Object can't do this because the last record might possibly come before the first record. But there are also exceptions where it would be technically possible to write single records according the encouraged guideline, but it is not. See e.g.: #11221 Select-Object -Unique is unnecessary slow and exhaustive

mklement0
  • 382,024
  • 64
  • 607
  • 775
iRon
  • 20,463
  • 10
  • 53
  • 79
  • 1
    Awesome. So, it's the implementation of cmdlets (in the Begin/Process/End block) that matters based on which powershell decides the order of execution. Thank you!! I have been looking at some of the stuff that powershell is able to do and I am just blown away with the merits of this .NET Objects Based Shell. Strong Oops with the flexibility of shell scripting language. I doubt that there is any alternative to this. Bash feels too primitive and naive in front of it. And yet people hail "bash" as some angel and undermine anything microsoft does just for the heck of it. – pragun Jul 11 '21 at 05:14
  • The `Select-Object` example is confusing, because the only reason the tabular output doesn't appear until later is unrelated to actual output sequencing: it is solely due to the infamous 300-msec. delay - see https://stackoverflow.com/a/43691123/45375 – mklement0 Aug 18 '21 at 22:43
  • Is there anyway to tell which cmdlets collect input before processing? I'm trying to determine where a pipeline is slowing down and whether I can rewrite it to remain in the pipeline or just need to overhaul everything. Off the top of my head, I can think of `Sort-`, `Group-`, but is there an exhaustive list of which cmdlets do or don't collect input upfront and under what scenarios? (e.g. `Import-Csv` you can pipe out if you want so it doesn't by necessity) – immobile2 Mar 03 '22 at 16:46
  • @immobile2, afaik, there is no list of what cmdlets choke the pipeline. In general, all standard PowerShell cmdlets are [implemented for the Middle of a Pipeline](https://learn.microsoft.com/powershell/scripting/developer/cmdlet/strongly-encouraged-development-guidelines#implement-for-the-middle-of-a-pipeline) unless that is not possible by its (parameter) defintion (e.g `Sort-Object` were the last object might become first). Also note that using parentheses or assinging a pipeline will choke the pipeline, see [this answer](https://stackoverflow.com/a/57232847/1701026). – iRon Mar 06 '22 at 07:48