Ways to detect an external program in the pipeline has exited

Question

In a pipeline like

GenerateLotsOfText | external-program | ConsumeExternalProgramOutput

when external-program exits, the pipeline just keeps on running until GenerateLotsOfText completes. Suppose external-program generates only one line of output then

GenerateLotsOfText | external-program | Select-Object -First 1 | ConsumeExternalProgramOutput

will stop the whole pipeline from the moment external-program generates output. This is the behavior I'm looking for, but the caveat is that when external-program generates no output but exits prematurely (because of ctrl-c for instance) the pipe still keeps on running. So I'm now looking for a nice way to detect when that happens and have the pipe terminate when it does.

It seems it's possible to write a cmdlet that uses System.Diagnostics.Process and then use Register-ObjectEvent to listen for the 'Exit' event, but that's quite involved with the handling of I/O streams, etc so I'd rather find another way.

I figured pretty much all other shells have this 'produce output when program exits' builtin via || and && and indeed this works:

GenerateLotsOfText | cmd /c '(external-program && echo FOO) || echo FOO' | Select-Object -First 1 | FilterOutFooString | ConsumeExternalProgramOutput

so no matter what external-program does, a line FOO will always be produced when it exits so the pipe will stop immediately (and then the FilterOutFooString takes care of only producing actual output). This isn't particularly 'nice' and has extra overhead because everything needs piping through cmd (or any other shell would work as well I assume). I was hoping pipeline chain operators would allow this natively but they don't seem to: trying the same syntax results in Expressions are only allowed as the first element of a pipeline. The chaining does work as expected, just not in the pipeline e.g. this yields the aforementioned ParserError:

GenerateLotsOfText | ((external-program && echo FOO) || echo FOO) | Select-Object -First 1

Is there another way to accomplish this?

Update other possible approaches:

use a separate runspace that polls $LASTEXITCODE in the runspace which runs the external command. Didn't find a way to do that in a thread-safe way though (e.g. $otherRunspace.SessionStateProxy.GetVariable('LASTEXITCODE') cannot be called from another thread when the pipeline is running)
same idea but for something else to poll on: in the NativeCommandProcessor can be seen that it will set the ExecutionFailed failed flag on the parent pipeline once the external process exits, but I didn't find a way to access that flag, let alone in a thread-safe way
I might be onto something using a SteppablePipeline. If I get it correctly it gets an object which allows manually executing pipeline iterations. Which would allow checking $LASTEXITCODE in between iterations.

looking into this after having implemented the last idea which is really straightforward, none of the above are an option: the process exit code basically only gets determined once the end block of the pipeline runs, i.e. after the upstream pipeline element produced all its output

Please add a concrete code sample of what you have tried with pipeline chain operators. You can use expressions further down the chain by using `ForEach-Object`. — zett42, Nov 03 '21 at 10:47
@zett42 just trying the exact same thing in PS as in other shells, edited — stijn, Nov 03 '21 at 11:52
Try `GenerateLotsOfText | %{ (external-program && echo FOO) || echo FOO } | Select-Object -First 1`. The `%` is short form of `ForEach-Object`. — zett42, Nov 03 '21 at 11:57
@zett42 unless I'm missing something that will launch external-program once for each line of output produced, not even piping those lines into external-program, which is very different from what I'm trying to achieve? — stijn, Nov 03 '21 at 12:49
Correct. Input redirection of external programs is not supported by PowerShell. You would have to do this using `cmd`. — zett42, Nov 03 '21 at 14:27
I cant help but feel like ive see a similar question answered by Mklement. Will try to search for it and post back! — Abraham Zinala, Nov 10 '21 at 04:21
I should probably not write answers before bed but... If I understand correctly, you can do what you want using the ProcessInfo / Process .net combo. This will work as long a s you set the 'RedirectStandardOutput' property to '$true'. From there, the output stream will be live in your process variable for you to manipulate. — Sage Pourpre, Nov 11 '21 at 12:04
@js2010 sure try something like `ls c:\ -rec -file -force | fzf` then quit fzf with Esc. You'll have to wait for the ls to complete before prompt appears again. Or something like `& {while($True){ 'foo'}} | cmd /c 'oopsthiscommandfails 2>NUL'` — stijn, Nov 11 '21 at 19:11

mklement0 · Accepted Answer · 2021-11-19T20:10:56.307

To be clear: As originally reported for Unix and then for Windows, PowerShell's current behavior (as of v7.2) should be considered a bug that will hopefully be fixed: PowerShell should detect when an native (external program) has exited and should stop all upstream commands in that event.

The following shows a generic workaround, which, however, invariably slows things down and has a limitation:

As of v7.2, PowerShell unfortunately doesn't offer a public feature to stop upstream commands (see GitHub feature request #); internally, it uses a private exception, as used by Select-Object -First, for instance. The function below uses reflection and ad-hoc-compiled C# code to throw this exception (as also demonstrated in this answer). This means that a performance penalty for the compilation is paid on first invocation of the function in a session.

Define the function nw ("native wrapper"), whose source code is below, and then invoke it as follows, for instance:

On Windows (assumes WSL; omit -Verbose to suppress verbose output):
```
1..1e8 | nw wsl head -n 10 -Verbose
```
On Unix-like platforms (omit -Verbose to suppress verbose output):
```
1..1e8 | nw head -n 10 -Verbose
```

You'll see that the upstream command - the enumeration of the large input array - is terminated when head terminates (for the reasons stated, this termination of the upstream command incurs a performance penalty on first invocation in a session).

Implementation note:

nw is implemented as a proxy (wrapper) function, using a steppable pipeline - see this answer for more information.

function nw {
  [CmdletBinding(PositionalBinding = $false)]
  param(
    [Parameter(Mandatory, ValueFromRemainingArguments)]
    [string[]] $ExeAndArgs
    ,
    [Parameter(ValueFromPipeline)]
    $InputObject
  )
  
  begin {

    # Split the arguments into executable name and its arguments.
    $exe, $exeArgs = $ExeAndArgs

    # Determine the process name matching the executable.
    $exeProcessName = [IO.Path]::GetFileNameWithoutExtension($exe)

    # The script block to use with the steppable pipeline.
    # Simply invoke the external program, and PowerShell will pipe input
    # to it when `.Process($_)` is called on the steppable pipeline in the `process` block,
    # including automatic formatting (implicit Out-String -Stream) for non-string input objects.
    # Also, $LASTEXITCODE is set as usual.
    $scriptCmd = { & $exe $exeArgs }

    # Create the steppable pipeline.
    try {
      $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
      # Record the time just before the external process is created.
      $beforeProcessCreation = [datetime]::Now
      # Start the pipeline, which creates the process for the external program
      # (launches it) - but note that when this call returns the
      # the process is *not* guaranteed to exist yet.
      $steppablePipeline.Begin($PSCmdlet)
    }
    catch {
      throw # Launching the external program failed.
    }
    # Get a reference to the newly launched process by its name and start time.
    # Note: This isn't foolproof, but probably good enough in practice.
    # Ideally, we'd also filter by having the current process as the parent process,
    # but that would require extra effort (a Get-CimInstance call on Windows, a `ps` call on Unix)
    $i = 0
    while (-not ($ps = (Get-Process -ErrorAction Ignore $exeProcessName | Where-Object StartTime -GT $beforeProcessCreation | Select-Object -First 1))) {
      if (++$i -eq 61) { throw "A (new) process named '$exeProcessName' unexpectedly did not appear with in the timeout period or exited right away." }
      Start-Sleep -Milliseconds 50
    }
  }
  
  process {
    
    # Check if the process has exited prematurely.
    if ($ps.HasExited) {

      # Note: $ps.ExitCode isn't available yet (even $ps.WaitForExit() wouldn't help), 
      #       so we cannot use it in the verbose message.
      #       However, $LASTEXITCODE should be set correctly afterwards.
      Write-Verbose "Process '$exeProcessName' has exited prematurely; terminating upstream commands (performance penalty on first call in a session)..."
      
      $steppablePipeline.End()

      # Throw the private exception that stops the upstream pipeline
      # !! Even though the exception type can be obtained and instantiated in
      # !! PowerShell code as follows:
      # !!   [System.Activator]::CreateInstance([psobject].assembly.GetType('System.Management.Automation.StopUpstreamCommandsException'), $PSCmdlet)
      # !! it cannot be *thrown* in a manner that *doesn't* 
      # !! result in a *PowerShell error*. Hence, unfortunately, ad-hoc compilation
      # !! of C# code is required, which incurs a performance penalty on first call in a given session.
    (Add-Type -PassThru -TypeDefinition '
      using System.Management.Automation;
      namespace net.same2u.PowerShell {
        public static class CustomPipelineStopper {
          public static void Stop(Cmdlet cmdlet) {
            throw (System.Exception) System.Activator.CreateInstance(typeof(Cmdlet).Assembly.GetType("System.Management.Automation.StopUpstreamCommandsException"), cmdlet);
          }
        }
    }')::Stop($PSCmdlet)
    
    }

    # Pass the current pipeline input object to the target process
    # via the steppable pipeline.
    $steppablePipeline.Process($_)

  }
  
  end {
    $steppablePipeline.End()
  }
}

Thanks for the elaborate (as usual) reply, definitely worth a bounty, but as I mentioned in my question: I already know it's possible to do this via System.Diagnostics.Process (in fact I found an implementation in PSFzf which does this) and I was really looking for another possibly simpler way, preferrably akin to what I'm doing via cmd. Which probably doesn't exist. — stijn, Nov 13 '21 at 19:50
Thank you, @stijn. I knew you didn't want a `Process` implementation, but I thought a _generic wrapper function_ that hides the details might help. However, I've since realized that a _steppable pipeline_ - as mentioned in your question - _can_ be used, after all (I thought it wouldn't relay pipeline input to an _external program_, but it does). Please see my updated function, which is both much simpler and faster. — mklement0, Nov 13 '21 at 22:53
Ah, great, can't believe I didn't think of just using Get-Process, but this is exactly what I was looking for. Might want to lookup parent process ID to be more sure it's the correct process. One question: what's the rationale for using try/catch/throw instead of just letting the exception bubble up? — stijn, Nov 15 '21 at 09:24
Glad to hear it, @stijn. Yes, additionally matching by parent PID would make sense, but due to the extra cost (`Get-CimInstance` call on Windows, call to the `ps` utility on Unix) I decided not to implement it. (Alternative P/Invoke solutions would again incur compilation cost). — mklement0, Nov 15 '21 at 15:10
As for the exception: Exceptions from .NET methods surface as _statement_-terminating errors in PowerShell, unfortunately, so you'll have to translate them into _script_-terminating (runspace-terminating) ones to actually stop execution of your function and its callers. You could also use `$PSCmdlet.ThrowTerminatingError()` to abort, which would amount to a statement-terminating error in the _caller_'s scope. — mklement0, Nov 15 '21 at 15:11
Currently trying this out 'for real' on windows. Observations for completeness: usually the Get-Process call is too soon, a wait/poll loop fixes this and the delay is not a real issue. Bit more of an issue is that HasExited is kind of slow; e.g. in a stripped-down version of this function with only an HasExited call, a pipe with like 30k items completes in about 500mSec seconds, without that call it's 100mSec. P/Invoking GetExitCodeProcess on a cached $ps.Handle is better at 200mSec but doesn't have the robustness of HasExited (which I'm not sure if is really needed here) — stijn, Nov 18 '21 at 18:17
And while I like the solution more, it's still slower than the `cmd /c '(external-program && echo FOO) || echo FOO'` trick posted in the question, didn't really expect that. — stijn, Nov 18 '21 at 18:23
Thanks, @stijn. I had wondered if a polling loop was required for the `Get-Process` call, but never saw a problem in my informal tests, and - seemingly erroneously - assumed that the process creation would occur synchronously in the `.Begin($PSCmdlet)` call. Very curious that `.HasExited` is slower than access to `.Handle` _plus_ a P/Invoke call. What makes the latter less robust? It doesn't surprise me that `cmd /c ...` is faster, but it won't terminate the upstream command on exit, right? — mklement0, Nov 18 '21 at 18:35
I've added a polling loop aound `Get-Process` with a 3+-second timeout to the code in the answer. — mklement0, Nov 18 '21 at 18:43
Note I'm caching the value returned by `.Handle` ; see [source](https://referencesource.microsoft.com/#System/services/monitoring/system/diagnosticts/Process.cs,4ee55b27a25daffb) of `HasExited` why it's slower, just has some more calls for getting the handle etc. On second thought for this particular implementation it's probably almost the same codepath so not more robust than directly calling `GetExitCodeProcess `. With the `cmd /c .. || ... &&` approach no matter what it eventually outputs FOO so checking that then using `CustomPipelineStopper` does terminate the pipe. — stijn, Nov 19 '21 at 20:02
@stijn, thanks; good to know re exit-code checking. Note that, because PowerShell itself doesn't stop the upstream cmdlets when the native program exits, all upstream cmdlets have by definition already completed when the pipeline returns. In the extreme case, with an _infinite_ output-producing upstream command, the pipeline _never_ returns; consider this Unix example which runs forever: `& { while ($true) { (++$i) } } | sh -c 'head -n 10 && echo foo || echo bar'` — mklement0, Nov 19 '21 at 20:16

Ways to detect an external program in the pipeline has exited

1 Answers1