2

I am fairly familiar with bash and know how to some basic scripting stuff involving pipes, and use them as a 'back end' to run python scripts in sequence.

However, for a new project I've been tasked with I can only use PowerShell. I've found that I can rewrite my previous shell scripts fine, buta I hear that you can pipe non-text data in PowerShell too.

My question is:

Is it possible to pipe non-text output (primarily a pandas dataframe) from a python script into another python script via PowerShell?

Something similar to:

script1.py | script2.py

If so, what are the logistics in regards to the python script? I.E can you still return to sys.stdout?

EDIT:

To better explain to usecase to be in line with the comments I've received.

I have two python scripts, test1.py:

#test1.py
import pandas as pd
import sys


def main():
    columns = ['A', 'B', 'C']
    data = [
        ['hello', 0,  3.14],
        ['world', 1,  2.71],
        ['foo',   2,  0.577],
        ['bar',   3,  1.61]

    ]

    df = pd.DataFrame(data, columns=columns)
    return df


if __name__ == "__main__":
    main().to_csv(sys.stdout, index_label=False)

and test2.py:

#test2.py
import pandas as pd
import sys


def main():
    df = pd.read_csv(sys.stdin)
    print(df.dtypes)


if __name__ == "__main__":
    main()

I'm using PowerShell to do some automation, and need to pipe the output of one script to the other; python test1.py | python test2.py works perfectly fine.

My question is, I have heard that you can pipe non-text data in PowerShell, which you can't do in Bash (I think), so is it possible to pipe the Dataframe as it is? (without having to convert to a CSV or some other string encoding)

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • why on EARTH would you need to do that? [*frown*] you are running 2 python instances one-after-the-other ... so just use python coding methods to get info from one to the other. powershell has no place whatsoever in the situation. – Lee_Dailey Aug 04 '21 at 18:55
  • 3
    Perhaps the author only has control of one or none of the python source files. Perhaps the process has worked on linux for years and they want to port it to windows without rewriting the logic. It seems like a reasonable question, though I do encourage the original poster to show more of what they have attempted so far. – JonSG Aug 04 '21 at 19:21
  • @Lee_Dailey We are using shell scripts as a backend to orchestrate many scripts (not just python) in a small ML platform where tooling options are limited. So, although I agree with you that it doesn't _NEED_ to be done this way at all, I am exploring what can be done this way. –  Aug 04 '21 at 19:38
  • @JonSG you hit the nail on the head - I do not have control of all the scripts, just the output. The process was originally built in bash, but there is no WSL/Bash for windows available in the workspace. –  Aug 04 '21 at 19:39
  • It looks like powershell may not support piping binary data : https://stackoverflow.com/questions/24708859/output-binary-data-on-powershell-pipeline – JonSG Aug 04 '21 at 20:17
  • 1
    thank all of y'all for the feedback. [*grin*] it still seems odd to need to feed one python script output to a 2nd one ... but not use python to do that. – Lee_Dailey Aug 05 '21 at 00:02
  • @JonSG this link confuses me, but that's probably because I am a mere Python dev and not knowledgeable in C (or much of PowerShell). –  Aug 05 '21 at 08:08
  • 1
    @Lee_Dailey no worries! –  Aug 05 '21 at 08:15
  • 2
    in the guts of that post there is some discussion about powershell `|` always doing an encoding and thus making piping of binary data impossible. I don't know enough powershell to really help. – JonSG Aug 05 '21 at 13:46

1 Answers1

1

Unfortunately, as of PowerShell 7.2, there is no support for binary data (raw bytes) in PowerShell's pipeline.

The workaround is to use cmd.exe /c (on Windows; on Unix-like platforms, use /bin/sh -c):

cmd /c 'script1.py | script2.py'

Note:

  • If you additionally want to capture the raw byte output in PowerShell:

    • Include an output redirection (>)in the cmd /c command string; e.g.:

      cmd /c 'script1.py | script2.py > out.bin'
      
    • Then read that file as bytes with Get-Content -Encoding Byte (Windows PowerShell) / Get-Content -AsByteStream (PowerShell (Core) 7+)

  • If, by contrast, you want to capture the output from the cmd /c call as text (strings):

    • You have to (temporarily) set [Console]::OutputEncoding to the system's active ANSI code page, which Python defaults to when outputting to something other than the console (deviating from the usual behavior of using the active OEM code page).

      • In Windows PowerShell (versions up to 5.1), you can to this as follows:

        [Console]::OutputEncoding = [System.Text.Encoding]::Default
        
        • Note: In PowerShell (Core) 7+, more work is needed:

          [Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP))
          
    • Note that you can also configure Python to output UTF-8 by default: see this answer; in that case, use the following:

       [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
      
    • See this answer for more information.

mklement0
  • 382,024
  • 64
  • 607
  • 775