5

Please, observe:

C:\> $x = @(1)
C:\> $x = @($x,2)
C:\> $x = @($x,3)
C:\> $x = @($x,4)
C:\> $x = @($x,5)
C:\> $x.Length
2
C:\> @($x |% { $_ }).Length
3
C:\> $x


Length         : 2
LongLength     : 2
Rank           : 1
SyncRoot       : {System.Object[] 2, 3}
IsReadOnly     : False
IsFixedSize    : True
IsSynchronized : False
Count          : 2

4
5


C:\>

I expected the pipeline to flatten the list. But it does not happen. What am I doing wrong?

mark
  • 59,016
  • 79
  • 296
  • 580
  • 2
    Well, right off hand I would say that it is because you are just adding dimensions to the array that is the first element of $x and Powershell probably only goes one dimension deep to flatten. – EBGreen Jun 19 '18 at 17:08
  • PowerShell only flattens arrays with one item: `@(@(@("Test"))).Length`, returns: `1` and `@(@(@("Test")))[0].GetType()`, returns: `String`. In other words: `@(@(@("Test"))) <=> @("Test")` – iRon Jun 19 '18 at 17:21

1 Answers1

8

I expected the pipeline to flatten the list.

  • PowerShell enumerates arrays (lists) and, generally speaking, (most) enumerable data types, i.e. it sends their elements one by one to the success output stream (for which the pipeline acts as a conduit).

    • If an element itself happens to be another enumerable, it is not also enumerated.

    • Conceptualizing output in PowerShell as object streams of indeterminate length is better than to think of it in terms of arrays (lists) and their flattening - see the bottom section.

  • Separately, during for-display formatting only, enumerables are enumerated two levels deep, so that visually you can't tell the difference between @(1, 2) (an array that is enumerated, resulting in two integers being output) and Write-Output -NoEnumerate @(1, 2) (an array that is output as a single object) - even though in terms of data output these commands differ.

    • This for-display-only nested enumeration stops at the second level, so that any element at that level is formatted as it itself, even if it happens to be yet another enumerable; e.g., in the case of an array, that array is formatted according to the usual PowerShell rules for a given object: because the .NET Array type has more than 4 properties, Format-List formatting is implicitly applied, resulting in a line-by-line display of Array instance's properties, as shown in the following example:

      # The 2nd element - a nested array - is formatted as a whole.
      PS> Write-Output -NoEnumerate @(1, @(2))
      1
      
      Length         : 1
      LongLength     : 1
      Rank           : 1
      SyncRoot       : {2}
      IsReadOnly     : False
      IsFixedSize    : True
      IsSynchronized : False
      Count          : 1
      

Applying the above to your specific example:

# Your nested array.
$x = @(1); $x = @($x,2); $x = @($x,3); $x = @($x,4); $x = @($x,5)

To visualize the resulting nested array, pass it to ConvertTo-Json:

PS> ConvertTo-Json -Depth 4 $x

[
  [
    [
      [
        [
          1
        ],
        2
      ],
      3
    ],
    4
  ],
  5
]

This tells us that you've created a two-element array, whose first element happens to be a nested array, itself comprising two elements, the first of which contains another nested array.

  • Therefore, $x.Length outputs 2.

  • @($x | % { $_ }).Length outputs 3 for the following reason:

    • Sending $x to the pipeline operator | enumerates its elements, and outputting each element (via $_ in the % (ForEach-Object) script block) causes each element to also be enumerated, if it happens to be an array.
    • Thus, the elements of the two-element array nested inside the first element of $x were output individually, followed by 5, the second element of $x, resulting in a total of three output objects.
    • Capturing these output objects in an array (@(...)) creates a three-element array whose .Length reports 3.

The for-display representation resulting from outputting $x itself follows from the explanation above.


Background information:

  • The PowerShell pipeline is a conduit for the success output stream, which is a stream of objects of indeterminate length, and only when that stream is captured - in the context of assigning to a variable ($var = ...) or making command output participate in an expression (e.g. (...).Foo) - does the concept of arrays enter the picture:

    • If the output stream happens to contain just one object, that object is captured as-is.
    • Otherwise, the PowerShell engine - of necessity - captures the multiple objects in a collection, which is an [object[]]-typed array.
  • Thus, the output stream has no concept of arrays, and it ultimately leads to confusion to discuss it in term of arrays and their flattening.

    • Note that the output stream (a pipeline) isn't only used when using the pipeline operator (|) to explicitly pipe data between commands, ...

    • ... it is also used implicitly by any single command to send its output to.

    • However, it is not used in an expression (alone), such as 1 + 2 or [int[]] (1, 2, 3) (a command is any cmdlet, function, script, script block, or external program) or when passing arguments to commands.

      • That said, if you send the result of an expression to a command via | (which only works if the expression is the first pipeline segment), a pipeline is again involved, and the usual enumeration behavior (discussed below) applies to the expression's result; e.g. [int[]] (1, 2, 3) | ForEach-Object { 1 + $_ }

      • Perhaps surprisingly, use of the @(...) and $(...) operators invariably involves a pipeline, so that even enclosing stand-alone expressions in them results in enumeration (and re-collection); e.g., @([int[]] (1, 2, 3)).GetType().Name reports Object[] ([object[]]), because the strongly typed [int[]] array was enumerated, and the results were collected in a regular PowerShell array.
        The only exceptions are array literals such as @(1, 2, 3), where (in PowerShell version 5 and above) this behavior is optimized away.
        By contrast, the (...) operator does not enumerate expression results.

  • By default, outputting an array to the success output stream (more generally, instances of most .NET types that implement the IEnumerable interface[1]), causes it to be enumerated, i.e. its elements are sent one by one.

    • As such, there is no guaranteed relationship between outputting an array and whether or not capturing the output also results in an (new) array.

    • Notably, outputting a single object is indistinguishable from outputting that single object wrapped in a (single-element) array.

  • Sending arrays (collections) as a whole to the pipeline requires additional effort:

    • Either use Write-Output -NoEnumerate

      # -> 1, because only *one* object was output, the array *as a whole*
      (Write-Output -NoEnumerate @(1, 2) | Measure-Object).Count
      
    • Or - more obscurely, but more efficiently, use the unary form of ,, the array constructor operator, in order to wrap an array in an aux., transitory array whose enumeration then outputs the original array as a single object:

      (, @(1, 2) | Measure-Object).Count # -> 1
      

[1] For a summary of which types PowerShell considers enumerable - which both excludes select types that do implement IEnumerable and includes one that doesn't - see the bottom section of this answer.

mklement0
  • 382,024
  • 64
  • 607
  • 775