2

I have a console application which can take standard input. It buffers up the data until the execute command, at which point it executes it all, and sends the output to standard output.

At the moment, I am running this application from Powershell, piping commands into it, and then parsing the output. The data piped in is relatively small; however this application is being called about 1000 times. Each time it is executed, it has to load, and create network connections. I am wondering whether it might be more efficient to pipeline all the commands into a single instantiation of the console application.

I have tried this by adding all Powershell script, that manufactures the standard input for the console, into a function, then piping that function to the console application. This seems to work at first, but you eventually realise it is buffering up all the data in Powershell until the function has finished, then sending it to the console's StdIn. You can see this because I have a whole load of Write-Host statements that flash by, and only then do you see the output.

e.g.

Function Run-Command1
{
    Write-Host "Run-Command1"
    "GET nethost xxxx COLS id,name"
    "EXEC"
}

Function Run-Command2
{
    Write-Host "Run-Command2"
    "GET nethost yyyy COLS id,name"
    "GET users yyyy COLS id,name"
    "EXEC"
}

...

Function Run-CommandX 
{
...
}

Previously, I would use this as:

Run-Command1 | netapp.exe -connect QQQQ -U user -P password
Run-Command2 | netapp.exe -connect QQQQ -U user -P password
...
Run-CommandX | netapp.exe -connect QQQQ -U user -P password

But now I would like to do:

Function Run-Commands
{
    Run-Command1
    Run-Command2
    ...
    Run-CommandX
}

Run-Commands |
netapp.exe -connect QQQQ -U user -P password

Ideally, I would like the Powershell pipeline behaviour to be extended to an external application. Is this possible?

Mark Bertenshaw
  • 5,594
  • 2
  • 27
  • 40
  • 1
    I guess it all depends on the data you want to pipeline to that function. You could have it accept an array of values, call it using Splatting maybe? Can we see some of the code? – Theo Nov 03 '20 at 12:39
  • So did you try your example function? That's what I'd start with. – Doug Maurer Nov 03 '20 at 14:21
  • Part of the mystery is solved: The buffering behavior on the PowerShell side changed between Windows PowerShell and PowerShell [Core] v6+ (which now streams, and no longer collects up front). The function in marsze's anwer is a workaround to eliminate PowerShell-side buffering and make it work like in PowerShell [Core] v6+, but I would not expect that to solve your problem. – mklement0 Nov 03 '20 at 18:16

2 Answers2

4

I would like the Powershell pipeline behaviour to be extended to an external application.
I have a whole load of Write-Host statements that flash by, and only then do you see the output.

Tip of the hat to marsze.

  • PowerShell [Core] v6+ performs no buffering at all, and sends (stringified) output as it is being produced by a command to an external program, in the same manner that output is streamed between PowerShell commands.[1]

  • PowerShell's legacy edition (versions up to 5.1), Windows PowerShell, buffers in that it collects all output from a command first before sending it(s stringification) to an external program.

However, I think even Windows PowerShell's behavior isn't the problem here: Your Run-Commands function executes very quickly - given that the functions it calls merely output string literals - and the resulting array of lines is then sent all at once to netapp.exe - and further processing, including when to produce output, is then up to netapp.exe. In PowerShell [Core] v6+, with PowerShell-side buffering out of the picture, the individual Run-Commmand<n> functions' output would be sent to netapp.exe ever so slightly earlier, but I wouldn't expect that to make a difference.

The upshot is that unless netapp.exe offers a way to adjust its input and output buffering, you won't be able to control the timing of its input processing and output production.


How PowerShell sends objects to an external program (native utility) via the pipeline:

  • It sends a stringified representation of each object:
    • in PowerShell [Core] v6+: as the object becomes available.
    • in Windows PowerShell: after having collected all output objects in memory first.

In other words: on the PowerShell side, from v6 onward, there is no buffering.[1]

  • However, receiving external programs typically do buffer the stdin (standard input) data they receive via the pipeline[2].

    • Similarly, external programs typically do buffer their stdout (standard output) streams (but PowerShell performs no additional buffering before passing the output on, such as to the terminal (console)).

    • PowerShell has no control over this behavior; either the external program itself offers an option to adjust buffering or, in limited cases on Linux, you can call the external program via the stdbuf utility.


Optional reading: How PowerShell stringifies objects when piping to external programs:

  • PowerShell, as of v7.1, knows only text when communicating with external programs; that is, data sent to such programs is converted to text, and output from such programs is interpreted as text - even though the underlying system IPC features are simply byte conduits.

  • The UTF-16-based .NET strings PowerShell uses are converted to byte streams for external programs based on the character encoding specified in the $OutputEncoding preference variable, which, regrettably, defaults to ASCII(!) in Windows PowerShell, and now sensibly to (BOM-less) UTF-8 in PowerShell [Core] v6+.

    • In other words: The encoding specified via $OutputEncoding must match the character encoding that the external program expects.

    • Conversely, it is the encoding specified in [Console]::OutputEncoding that determines how PowerShell interprets text received from an external program, i.e. how it converts the bytes received to .NET strings, line by line, with newlines stripped (which, when captured in a variable, amounts to either a single string, if only one line was output, or an array of strings).

  • The for-display representations you see in the PowerShell console (terminal) are also what is sent to external programs via the pipeline, as lines of text, specifically:

    • If an object (already) is a string (or [char] instance), PowerShell sends it as-is to the pipe, but with a platform-appropriate newline invariably appended.

      • That is, a CRLF newline is appended on Windows, and a LF-only newline on Unix-like platforms.

      • This behavior can be problematic, as there are situations where you do not want that, and there's no way to prevent it - see GitHub issue #5974, GitHub issue #13579, and this answer for a workaround.

    • If an object is, loosely speaking, a primitive type - something that is conceptually a single value, notably the various number types - it is stringified in a culture-sensitive manner, where available[3], a platform-appropriate newline is again invariably appended.

      • E.g., with, a French culture in effect (as reflected in Get-Culture), decimal fraction 1.2 - which PowerShell parses as a [double] value - is sent as 1,2<newline>.

      • Note that [bool] instances are not culture-sensitive and are always converted to strings True or False.

    • All other (complex) types are subject to PowerShell's rich for-display output formatting, and whatever you would see in the terminal (console) is also what is sent to external programs - which not only again potentially contains culture-sensitive representations, but is generally problematic in that these representations are designed for the human observer, not for programmatic processing.

The upshot:

  • Beware encoding problems - make sure $OutputEncoding and [Console]::OutputEncoding are set correctly.

  • To avoid unexpected culture-sensitivity and unexpected for-display formatting, it is best to deliberately construct the string representation you want to send.


[1] By default; however, you can explicitly request buffering - expressed as an object count - via the common -OutBuffer parameter

[2] On recent macOS and Linux platforms, the stdin buffer size is 64KB. On Unix-like platforms, utilities typically switch to line-buffering in interactive invocations, i.e. when the stream in question is connected to a terminal.

[3] The behavior is delegated to the .ToString() method of a type at hand, i.e. whether or not that method outputs a culture-sensitive representation.

mklement0
  • 382,024
  • 64
  • 607
  • 775
3

EDIT: As @mklement0 pointed out, this is different in PowerShell Core.

In PowerShell 5.1 (and lower) think you would have to manually write each pipeline item to the external application's input stream.

Here's an attempt to build a function for that:

function Invoke-Pipeline {
    [CmdletBinding()]
    param (
        [Parameter(Mandatory, Position = 0)]
        [string]$FileName,

        [Parameter(Position = 1)]
        [string[]]$ArgumentList,

        [int]$TimeoutMilliseconds = -1,

        [Parameter(ValueFromPipeline)]
        $InputObject
    )
    begin {
        $process = [System.Diagnostics.Process]::Start((New-Object System.Diagnostics.ProcessStartInfo -Property @{
            FileName = $FileName
            Arguments = $ArgumentList
            UseShellExecute = $false
            RedirectStandardInput = $true
            RedirectStandardOutput = $true
        }))
        $output = [System.Collections.Concurrent.ConcurrentQueue[string]]::new()
        $event = Register-ObjectEvent -InputObject $process -EventName 'OutputDataReceived' ` -Action {
             $Event.MessageData.TryAdd($EventArgs.Data)
        } -MessageData $output
        $process.BeginOutputReadLine()
    }
    process {
        $process.StandardInput.WriteLine($InputObject)
        [string]$line = ""
        while (-not ($output.TryDequeue([ref]$line))) {
            start-sleep -Milliseconds 1
        }
        do {
            $line
        } while ($output.TryDequeue([ref]$line))
    }
    end {
        if ($TimeoutMilliseconds -lt 0) {
            $exited = $process.WaitForExit()
        }
        else {
            $exited = $process.WaitForExit($TimeoutMilliseconds)
        }
        if ($exited) {
            $process.Close()
        }
        else {
            try {$process.Kill()} catch {}
        }
    }
}

Run-Commands | Invoke-Pipeline netapp.exe "-connect QQQQ -U user -P password"

The problem is, that there is no perfect solution, because by definition, you cannot know when the external program will write something to its output stream, or how much.

Note: This function doesn't redirect the error stream. The approach would be the same though.

marsze
  • 15,079
  • 5
  • 45
  • 61
  • I tried this earlier but couldn't get it working. Your code is sending the correct information to the EXE, but as is, doesn't return anything to Powershell. I have now tried adding RedirectStandardOutput = $true, and a test for $process.StandardOutput.EndOfStream, - if so, it does a ReadLine(). Sadly the last ReadLine() never returns after all the data has been returned from the first command. But basically, we are getting there! – Mark Bertenshaw Nov 03 '20 at 14:06
  • Well, that was the whole point of the question. I *am* able to pipeline data into the console EXE, but Powershell buffers it up, then sends the data. I would like the loose way that a Powershell pipeline works. – Mark Bertenshaw Nov 03 '20 at 14:23
  • Very cool. It just shows me that I've got to figure out Powershell events. I guess that synchronously streaming stdinput doesn't quite work with Powershell. – Mark Bertenshaw Nov 03 '20 at 15:00