Theo's helpful answer shows a fast, in-place way to sort an array that's already in memory and stored in a variable, using a .NET API.
If you want to sort and also eliminate duplicates (the equivalent of Sort-Object -Unique
), you can use a System.Collections.Generic.SortedSet<T>
instance:
PS> [System.Collections.Generic.SortedSet[string]]::new(
[string[]] ('foo', 'bar', 'baz', 'foo'),
[System.StringComparer]::InvariantCultureIgnoreCase
)
bar
baz
foo
Note: If you use [object]
as SortedSet
's generic type argument, you don't need the [string[]]
cast, but it's generally preferable to use specific types.
Note: The result isn't an array, but it can be enumerated like one with a foreach
statement and in the pipeline. To copy the results to an array - such as when you need to apply indexing (e.g, [0]
) -
preallocate a target array of the same type and use the .CopyTo()
method to fill it ($arr = [string[]]::new($sortedSet.Count); $sortedSet.CopyTo($arr)
)
An alternative is to sort the array with duplicates first, and then apply System.Linq.Enumerable.Distinct()
afterwards:
[string[]] $arr = 'foo', 'bar', 'baz', 'foo'
[Array]::Sort($arr, [System.StringComparer]::InvariantCultureIgnoreCase)
[Linq.Enumerable]::Distinct($arr)
Note: The result is a lazy enumerable, not an array, but it can be enumerated like one with a foreach
statement and in the pipeline. Call the .ToArray()
method to create an array explicitly, such as when you need to apply indexing (e.g, [0]
).
As for your question:
So, what on earth is -InputObject for then?
For the majority of cmdlets, unfortunately, the -InputObject
parameter is just an implementation detail: its purpose is to enable input via the pipeline, and its direct use with arrays (collections) is pointless, such as in the case of Sort-Object
.
GitHub issue #4242 asks that -InputObject
be clearly documented as such, and also contain a list of cmdlets that do meaningfully support direct use of -InputObject
, however, not as an alternative to pipeline input, but with different semantics, operating on an array (a collection) as a whole when -InputObject
is used; e.g., 1, 'foo' | Get-Member
works (meaningfully) differently from Get-Member -InputObject (1, 'foo')
: the former reports the types of the array's elements, the latter the type of the array itself.
Among data-processing cmdlets (as opposed to formatting cmdlets), it is effectively only Write-Output
and Out-String
(which in part is a formatting cmdlet as well) that support direct -InputObject
use with arrays; e.g.:
# Both commands produce the same output.
1, 2 | Write-Output
Write-Output -InputObject 1, 2
Even there, however the behavior differs with nested arrays, because the enumeration depths differ between the two methods:
# NOT the same, due to nesting.
1, (2, (3, 4)) | Write-Output # -> 1, 2, (3, 4)
Write-Output -InputObject 1, (2, (3, 4)) # -> 1, (2, (3, 4))
This non-support for direct, array-valued -InputObject
argument is unfortunate, because it can greatly speed up things, especially with data already in memory in full:
Bypassing the one-by-one streaming that invariably occurs in the pipeline (requiring a handshake of sorts between the sending and the receiving command for each object) - can greatly boost performance.
An example is the -Value
parameter of the Set-Content
cmdlet, which does accept direct array-valued arguments in lieu of pipeline input, and using -Value
directly greatly speeds up the operation.
# Write 100,000 (1e5) numbers to a file:
# Via the pipeline.
1..1e5 | Set-Content temp.txt
# Via -Value - this is much, much faster.
# E.g. on my macOS machine with PowerShell 7.1, about 100(!) times faster.
Set-Content temp.txt -Value (1..1e5)
Potential improvements:
Note:
- A cmdlet is in theory free to implement its own array support for
-InputObject
, but that is (a) cumbersome, due to requiring extra logic, and (b) requires either declaring the parameter as an array type (which is inefficient, because even single objects received via the pipeline are then wrapped in arrays) or declaring it as object
, which potentially forfeits type safety.
Ideally, PowerShell itself should provide this support, along the following lines:
Extend the [Parameter]
attribute
with a new, Boolean EnumerateArgument
property that can be combined with the existing ValueFromPipeline
and ValueFromPipelineByPropertyName
properties, which, when set to $true
, would instruct PowerShell:
to implicitly accept arrays of the specified (scalar) parameter type with direct use of the parameter, e.g., [int[]]
for an [int]
-typed parameter
and to enumerate those arrays just like in the pipeline, calling the cmdlet's process
block (cmdlets (advanced functions) implemented in PowerShell) / .ProcessRecord()
method (binary cmdlets) for each enumerated object.
A hypothetical (contrived) example:
function ConvertTo-Long {
[CmdletBinding()]
param(
# WISHFUL THINKING: implicit array support for direct -InputObject arguments
[Parameter(ValueFromPipeline, EnumerateArgument)]
[int] $InputObject
)
process {
[long] $InputObject
}
}
# The following calls would then be equivalent:
1, 2, 3 | ConvertTo-Long
ConvertTo-Long -InputObject 1, 2, 3
Note:
The improvement would be available to any pipeline-binding parameter (not just -InputObject
).
Arguably, EnumerateArgument
should be $true
by default, and that those rare cmdlets where passing an array as an argument has a different meaning, such as Get-Member
, should opt out - however, that woulld be a backward-compatibility concern.
Since the proposed enhancement would still involve calling the process
block / the .ProcessRecord()
method for each enumerated object, the speed-up won't be as dramatic as with a custom implementation that itself performs the enumeration in a single call. However, to me the prospect of unifying the behavior between the pipeline and direct -InputObject
use alone makes this improvement worthwhile.