Is this a good "pattern" for processing collection-based parameters which belong to parameter sets of a function?

Question

I've been writing advanced functions for many years now and have even written quite a few modules at this point. But there's one question for which I have never really been able to find an answer.

Let's look at a Cmdlet that Microsoft provides in the MSMQ module, as an example, and "re-implement" it as an advanced PowerShell function: Send-MsmqQueue. But this function will be a bit different than the one provided by the MSMQ module in that not only will it accept multiple MSMQ queues for the $InputObject parameter, but also multiple MSMQ queue names for the $Name parameter, where these two parameters belong to different parameter sets. (The Cmdlet version of this function normally only accepts a single string value for the $Name parameter.) I won't be showing a complete re-implementation, just enough to illustrate what I, at times, find myself doing when this situation arises. (NOTE: one other slight difference is that I will be using the classes from System.Messaging namespace instead of the PowerShell-provided ones in Microsoft.Msmq.PowerShell.Commands namespace. So assume that implicitly, somewhere, Add-Type -AssemblyName System.Messaging has been executed.)

function Send-MsmqQueue {
    [CmdletBinding(DefaultParameterSetName = 'Name')]
    [OutputType([Messaging.Message])]
    Param (
        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = 'InputObject')
        ]
        [Messaging.MessageQueue[]] $InputObject,

        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = 'Name')
        ]
        [string[]] $Name,

        # Below is the original parameter name, not mine ;)
        [Messaging.Message] $MessageObject

        # All other normal Send-MsmqQueue parameters elided as they are not
        # needed to illustrate the premise of my question.
    )

    Process {
        # When I have parameters defined as above, the first thing I do in my
        # Process block is "homogenize" the data so I don't have to implement
        # two foreach loops or do the branching on each foreach loop iteration
        # which can obscure the main logic that is being executed, i.e., I get
        # this done all "up-front".
        #
        # One aspect of my question is, from purely a PowerShell perspective,
        # is this hurting performance in any meaningful way? (I know that when it
        # comes to specific implementation details, there are INFINITE ways to
        # write non-performant code, so from purely a PowerShell perspective,
        # as far as the language design/inner-workings, is this hurting
        # performance?
        #
        # NOTE: I don't normally need the wrapping "force this thing to be an
        # array" construct (,<array_items>), BUT, in this case, the C#
        # System.Messaging.MessageQueue class implements IEnumerable,
        # which PowerShell (unhelpfully) iterates over automatically, and results
        # in the messages in the queues being iterated over instead of the queues
        # themselves, so this is an implementation detail specific to this
        # particular function.
        $Queues = (,@(
            if ($PSCmdlet.ParameterSetName -ieq 'Name') {
                # Handle when the parameter is NOT passed by the pipeline...
                foreach ($n in $Name) { [Messaging.MessageQueue]::new($n) }
            } else {
                $InputObject
            }
        ))

        # I like using 'foreach (...) { ... }' instead of ForEach-Object because
        # oftentimes, I will need to break or continue based on implementation
        # details, and using ForEach-Object in combination with break/continue
        # causes the pipeline to prematurely exit.
        foreach ($q in $Queues) {
            $q.Send($MessageObject)
            # Normally, I wouldn't return this, especially since it wasn't
            # modified, but this is a re-implementation of MSFT's Send-MsmqQueue,
            # and it returns the sent message.
            $MessageObject
        }
    }
}

As I stated in the introduction to this question, I have written many functions which take varying collection-based parameters belonging to different parameter sets which can be piped into the function, and this is the pattern that I use. I'm hoping someone can either confirm that this is OK from a PowerShell language/style perspective and/or help me understand why I should not do this and what I ought to consider instead.

Thank you!

score 2 · Accepted Answer · answered Jun 29 '23 at 19:05

A fundamental performance decision is whether you want to optimize for argument-passing vs. pipeline input:

Declaring your parameters as arrays (e.g. [string[]] $Name) allows efficient passing of multiple input objects by argument (parameter value).
However, doing so hurts pipeline performance, because a single-element array is then created for each every pipeline input object, as the following example demonstrates: It outputs String[] for each of the scalar string elements of the array passed via the pipeline:
```
'one', 'two' | 
  & {
    param(
      [Parameter(Mandatory, ValueFromPipeline)]
      [string[]] $Name
    )
    process {
      $Name.GetType().Name # -> 'String[]' *for each* input string
    }
  }
```
- Note: For brevity, the example above as well all others in this answer use a script block in lieu of a function definition. That is, a function declaration (function foo { ... }) followed by its invocation (... | foo) is shortened to the functionally equivalent ... | & { ... }

See GitHub issue #4242 for a related discussion.

With array parameters, you indeed need to ensure element-by-element processing yourself, notably inside the process block if they're also pipeline-binding.

As for "homogenizing" parameter values of different types so that only one processing loop is required, two fundamental optimizations are possible:

Declare only a single parameter and rely either on PowerShell to automatically convert values of other types to that parameter's type, or implement an automatically applied custom conversion, which obviates the need for "homogenizing" altogether:

The conversion is automatic if the parameter type has a public, single-parameter constructor that accepts an instance of the other type as its (only) argument or - in case the other type is [string], if the type has a static ::Parse() method with a single [string] parameter; e.g.:
```
# Sample class with a single-parameter
# public constructor that accepts [int] values.
class Foo {
  [int] $n
  Foo([int] $val) {
    $this.n = $val
  }
}

# [int] values (whether provided via the pipeline or as an argument)
# auto-convert to [Foo] instances
42, 43 | & {
  [CmdletBinding()]
  param(
    [Parameter(ValueFromPipeline)]
    [Foo[]] $Foo
  )
  process {
    $Foo # Diagnostic output.
  }
}
```
- In your case, [Messaging.MessageQueue] does have a public single-parameter constructor that accepts a string (as evidenced by your [Messaging.MessageQueue]::new($n) call), so you could simply omit the $Name parameter declaration, and rely on the automatic conversion of [string] inputs.
- A general caveat:
  - This automatic conversion - which also happens with casts (e.g, [Foo[]] (0x2a, 43), see below) and the (rarely used) type-conversion form of the intrinsic .ForEach() (e.g., (0x2a, 43).ForEach([Foo])) - is stricter than calling a single-element constructor with respect to matching the constructor's parameter type.
  - I'm unclear on the exact rules, but using a [double] value, for instance, succeeds with [Foo]::new(42.1) (that is, conversion to [int] is automatically performed), but fails with both [Foo] 42.1 and (42.1).ForEach([Foo]) (the latter currently produces an obscure error message).

If the conversion isn't automatic, implement a custom conversion that PowerShell then applies automatically, by way of decorating your parameter with a custom attribute that derives from the abstract ArgumentTransformationAttribute class; e.g.:

using namespace System.Management.Automation

# Sample class with a single-parameter
# public constructor that accepts [int] values.
class Foo {
  [int] $n
  Foo([int] $val) {
    $this.n = $val
  }
}

# A sample argument-conversion (transformation) attribute class that
# converts strings that can be interpreted as [int] to [Foo] instances.
class CustomTransformationAttribute : ArgumentTransformationAttribute  {
  [object] Transform([EngineIntrinsics] $engineIntrinsics, [object] $inputData) {            
    # Note: If the inputs were passed as an *array argument*, $inputData is an array.
    return $(foreach ($o in $inputData) {
      if ($null -ne ($int = $o -as [int])) { [Foo]::new($int) }
      else                                 { $o }
    })
  }
}

# [string] values (whether provided via the pipeline or as an argument)
# that can be interpreted as [int] now auto-convert to [Foo] instances,
#  thanks to the custom [ArgumentTransformationAttribute]-derived attribute.
'0x2a', '43' | & {
  [CmdletBinding()]
  param(
    [Parameter(ValueFromPipeline)]
    [CustomTransformation()] # This implements the custom transformation.
    [Foo[]] $Foo
  )
  process {
    $Foo # Diagnostic output.
  }
}

If you do want separate parameters, optimize the conversion process:
- The auto type-conversion rules described above also apply to explicit casts (including support for arrays of values), so you can simplify your code as follows:
```
if ($PSCmdlet.ParameterSetName -eq 'Name') {
  # Simply use an array cast.
  $Queues = [Messaging.MessageQueue[]] $Name
} else {
  $Queues = $InputObject
}
```
- In cases where element-by-element construction to effect conversion is required:
```
if ($PSCmdlet.ParameterSetName -eq 'Name') {
  # Note the ","
  $Queues = foreach ($n in $Name) { , [Messaging.MessageQueue]::new($n) }
} else {
  $Queues = $InputObject
}
```
  - Note the use of the unary form of , the array constructor ("comma") operator, as in your attempt, albeit:
    - inside the foreach loop, and
    - without @(...) enclosure of the object to wrap in a single-element array, as @(...) itself would trigger enumeration.
    - While Write-Output -NoEnumerate ([Messaging.MessageQueue]::new($n)), as shown in Mathias' answer works too, it is slower. It comes down to a tradeoff between performance / concision vs. readability / signaling the intent explicitly.
  - The need to wrap each [System.Messaging.MessageQueue] instance in an aux. single-element wrapper with unary , / to use Write-Output -NoEnumerate stems from the fact that this type implements the System.Collections.IEnumerable interface, which means that PowerShell automatically enumerates instances of the type by default.^[1] Applying either technique ensures that the [System.Messaging.MessageQueue] is output as a whole to the pipeline (for details, see this answer).
    - Note that this is not necessary in the first snippet, because $Queues = [Messaging.MessageQueue[]] $Name is an expression, to which automatic enumeration does not apply.
    - The above also implies that you need the same technique if you want to pass a single [System.Messaging.MessageQueue] instance or a single-element array containing such an instance via the pipeline; e.g.:
```
 # !! Without `,` this command would *break*, because
 # !! PowerShell would try to enumerate the elements of the queue
 # !! which fails with an empty one.
 , [System.Messaging.MessageQueue]::new('foo') | Get-Member
```
  - By not using an if statement as a single assignment expression ($Queue = if ...) and instead assigning to $Queue in the branches of the if statement, you additionally prevent subjecting $InputObject to unnecessary enumeration.

^{[1] There are some exceptions, notably strings and dictionaries. See the bottom section of this answer for details.}

Thank you for this rather complete answer. I rather hate adding C# code to my PowerShell code because it "pollutes" the session (i.e., the types cannot be "removed" when removing a module that contains this code). This wouldn't be a problem if I was writing binary modules, but I tend to write script modules at this time. — fourpastmidnight, Jun 29 '23 at 19:15
I didn't realize that `@(...)` causes enumeration! I learned something else about PowerShell today that I have not come across in the 4 years I've been writing PowerShell! So thank you for pointing that out. — fourpastmidnight, Jun 29 '23 at 19:15
The reason I tend to use `$Queue = if () { ... } else { ... }` is because of ScrictMode, which I tend to use. In the case where, perhaps, none of the branches were to be visited (in this case, that _wouldn't_ happen), and `$Queues` never gets set, Strict mode throws a terminating error. — fourpastmidnight, Jun 29 '23 at 19:18
My pleasure, @fourpastmidnight; note that there is _no_ C# code in my answer, only a _PowerShell_ [`class`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Classes) definition (requires v5+) - and, at least with the sample code in your question, it isn't even needed. — mklement0, Jun 29 '23 at 19:18
Ah, yes, you are correct. Unfortunately, I'm currently stuck with having to maintain PSv4 compatibility, so I don't get to use all that new goodness in PSv5+ :( But, that is slowly changing as we upgrade all of our remaining application servers off of Windows Server 2012R2, some of which haven't had WMF 5.0 installed. — fourpastmidnight, Jun 29 '23 at 19:21
@fourpastmidnight, in general I do appreciate the conceptual clarity of the idiom `$var = if () { ... } else { ... }`, but it does introduce pipeline output semantics, which can cause problems. The assignment-inside-the-branches approach is equally safe/unsafe with respect to strict mode: as long as the _visited_ branches don't have undeclared variables, there's no problem. (But I do understand that the former idiom more elegantly and robustly ensures the existence of a variable in later code. Alternative: _unconditionally_ initialize the variable, followed by a single-branch `if`) — mklement0, Jun 29 '23 at 19:25
Thank you also for the added detail about what you're optimizing for. Yes, I thought that was happening "under the covers" (single scalar items are "upconverted" to an array), which can have significant performance overhead in some contexts. I do tend to try to make it efficient to pass arguments to my functions, regardless of whether it's by argument or by the pipeline. But understanding what's happening under the covers in certain scenarios, one may prefer for the performance aspect more than the convenience aspect. — fourpastmidnight, Jun 29 '23 at 19:28
@fourpastmidnight, understood re PSv4 compatibility - don't know if you saw the update to my previous comment: if the other type you want to accept is `[string]` and the target type has an automatic (PowerShell) from-string conversion (which is true of your sample code), you won't need a `class`. — mklement0, Jun 29 '23 at 19:34
I did see that. And I'm trying it out now, but it looks like I missed something, as automatic conversion did not occur. :/ Probably something silly on my part :) — fourpastmidnight, Jun 29 '23 at 19:37
@fourpastmidnight, the following minimal example works for me in v5.1 (don't have access to v4): `Add-Type -AssemblyName System.Messaging; function foo { [CmdletBinding()] param( [Parameter(Mandatory, ValueFromPipeline)] [Messaging.MessageQueue[]] $InputObject ) process { $InputObject.QueueName } } 'queue1', 'queue2' | foo` — mklement0, Jun 29 '23 at 19:46
@fourpastmidnight, as for the enumeration behavior of `@(...)` - more details in [this answer](https://stackoverflow.com/a/72718166/45375). — mklement0, Jun 29 '23 at 19:47
I got the conversion to work. Like I said, something silly on my part. Thanks!! — fourpastmidnight, Jun 29 '23 at 19:47
Thank you so much for the help and completeness of your answer (and comments!). I love to learn the inner details of things, and PowerShell is _awfully_ complex (in a good way, more often than not), and as much as I think I know, there's still much to learn. I started with PS as a C# dev and tried to treat PS as C#. BIG MISTAKE. So, understanding this language at a deep level is _extremely_ helpful. Thank you again! — fourpastmidnight, Jun 29 '23 at 20:01
OK, furthermore, using elements from this answer, resolved the issue where `$stuff = Write-Output -NoEnumerate; $stuff` did not output what I thought it should, where as doing `$stuff = , ; $Stuff` resulted in printing the properties for the base object, i.e., in this case, the properties of a `MessageQueue` instead of nothing (or worse, iterating over potentially _thousands_ of message queue messages!) — fourpastmidnight, Jun 29 '23 at 20:28
@fourpastmidnight, yes, in an _assignment_, `Write-Output -NoEnumerate` captures the enumerable _as is_, whereas `, ` _wraps it in a single-element array_ (that _isn't_ enumerated in an _assignment_ whose value is an _expression_). Outputting the former enumerates the collection/enumerable and therefore prints representations of its _elements_, whereas outputting the latter enumerates only the _array wrapper_, and therefore prints a representation of the enumerable _itself_. — mklement0, Jun 29 '23 at 20:42

score 1 · Answer 2 · answered Jun 29 '23 at 14:30

1

This pattern ("homogenizing" the input entities based on chosen parameter set) is perfectly valid, and constitutes - in my personal opinion at least - good parameter design.

That being said, you might want to use Write-Output -NoEnumerate to avoid the clunky ,@(...) unwrapped-wrapped-array unpacking trick:

if ($PSCmdlet.ParameterSetName -ieq 'Name') {
    # Handle when the parameter is NOT passed by the pipeline...
    $Queues = foreach ($n in $Name) {
        $queue = [Messaging.MessageQueue]::new($n)
        Write-Output $queue -NoEnumerate
    }
}
else {
    # Input is already [MessageQueue[]], avoid pipeline boundaries entirely
    $Queues = $InputObject 
}

answered Jun 29 '23 at 14:30

Mathias R. Jessen

157,619
12
148
206

Thanks for the confirmation that what I'm doing is good design! But, oooo, I didn't know about this `Write-Output -NoEnumerate`! I'm going to go do research on this right now. (Mind you, I've had to remain PSv4 compatible for a while now due to Windows Server 2012R2 still being deployed with no update to WMF 5--but that is changing, so I will now need to be PSv5+ compatible.) – fourpastmidnight Jun 29 '23 at 14:33
1

@fourpastmidnight Automatic enumeration on pipeline boundaries is facilitated by the "pipeline processor" that executes the given pipeline - [`Write-Output -NoEnumerate`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/write-output) signals the processor to _skip_ enumeration immediately "downstream" from it. This behavior goes back to 2.0 and `Write-Output -NoEnumerate` is available at least by 3.0, possibly 2.0 too (can't remember) – Mathias R. Jessen Jun 29 '23 at 14:35
Wow, I have never come across that. I've read _Windows PowerShell in Action, 3rd edition_, and this is not mentioned in that book _anywhere_. A _HUGE_ glaring _MISS_ in that book, in my opinion, which otherwise is a fairly decent resource. Thanks so much for showing me this--you just save me from a LOT of headaches! – fourpastmidnight Jun 29 '23 at 14:37
@MattiasRJessen Hmm, the biggest problem with `Write-Output $Queues -NoEnumerate` is that a `System.Object[]` is now returned, and not a `[Messaging.MessageQueue[]]`. This means, for example: `$qs = Get-MsmqQueue; $qs` will print _nothing_ because there's no _sensible_ output for a generic `object`, whereas, I was expecting something like a "stringified" version of the object to be output, i.e. the name of the queue. Oh well. That's still a handy little trick to have up your sleeve. Thanks again! – fourpastmidnight Jun 29 '23 at 16:17
@fourpastmidnight I think you might have misunderstood PowerShell's typing semantics - output formatting is based on the reflected type of the _enumerated_ item at runtime. It sounds like you might have a different issue. Does `$qs |Select Path,QueueName` show sensible output? – Mathias R. Jessen Jun 29 '23 at 16:33
@MathiasRJesson That works just fine. I think it's because `Write-Output` literally returns an `object[]`, even though the objects themselves are a `MessageQueue[]`, because as we know, all objects inherit from `object`. I was thinking about trying to use `-as [MessageQueue[]]` on the output (I already tried a cast and that had no effect either). – fourpastmidnight Jun 29 '23 at 17:28
And, because the type has a `Path` and `QueueName` property, `Select-Object` "just works" as expected. But simply doing `$qs` does not. It _also_ might be because the type `System.Messaging.MessageQueue` does not have any information (in a `Formats.ps1xml` file that says what should be output for a `MessageQueue`. (I think this is most likely the right answer, and as you said, was a misunderstanding on my part.) – fourpastmidnight Jun 29 '23 at 17:30
1

While this answer is valid and helpful in answering my question, the answer provided by @mklement0 is very complete and thorough, explaining what's happening "under the hood" and the trade-offs in the design. I appreciate both answers to my question and encourage everybody to keep contributing to SO, even if their answer is not selected as "the answer" (or once was). Thank you for your contribution and effort! I really appreciate it! – fourpastmidnight Jun 29 '23 at 19:31

Is this a good "pattern" for processing collection-based parameters which belong to parameter sets of a function?

2 Answers2