0

I'm trying to pass the data in the pipe through the python script and write this data into a separate variable for further processing

The main purpose of the script is to be a layer between pipes, to record parsing errors and what leads to them.

Application:

echo '{"user": "Basic dGVzdDp0ZXN0"}' | script.py | rotatelogs {....}

I made a script:

cmd = ["/usr/bin/jq -c \'.user |= if test(\"^[Bb]asic \") then .[6:] | @base64d | gsub (\":.*$\"; \"\")  else . end \'"]

with open('/dev/stdin') as f: 
        try:
            subprocess.run(cmd, check = True, shell=True)
        except subprocess.CalledProcessError:
            with open('/path/to/parseerror.log', 'w') as pfile:
                pfile.write(f.read())        

Command in subprocess.run executes successfully and produces its output, but f.read() become empty If I move the reading of the variable f.read() to execution subprocess.run , then I will get the value of the variable, but the command subprocess.run in will not be executed (get null pipe input).

with open('/dev/stdin') as f: 
        line=(f.read())
        try:
            subprocess.run(cmd, check = True, shell=True)
        except subprocess.CalledProcessError:
                ....

How can I combine the execution of the command with the parameters from the pipe and record the incoming pipe itself? The main goal is to pass the command execution through the script and write the received pipe parameters to a separate file

Dargod
  • 27
  • 7
  • 2
    `/dev/stdin` is already open: it's `sys.stdin`. – chepner Apr 05 '23 at 16:59
  • 1
    It's not clear what you intend the words in the question to mean when combined in the manner given. Can you provide example values of `cmd` and `something_else` that will produce desired output only if script.py is written exactly the way you want? That way someone can verify whether they're interpreting the question as you intend it, and whether their answers work as you want. (Effectively, what I'm asking for here is a [mre]). – Charles Duffy Apr 05 '23 at 17:20
  • Added information to the question – Dargod Apr 05 '23 at 18:40
  • The only part of the question I'm still not sure I understand is exactly what content is expected to be on your Python script's stdout for consumption by `rotatelogs`. – Charles Duffy Apr 06 '23 at 03:57
  • Well, the other thing here is that a given byte from stdin can be read _only by one thing_. If it's read by your Python script, then it's not available for jq to read. If it's read by jq, it's not available for your Python script to read. If you want both processes to have some input, then you should read it into Python _first_, then make Python write it back out into jq's stdin. – Charles Duffy Apr 06 '23 at 03:59
  • 1
    And note that using `shell=True` is a bad idea in general. Is it a firm requirement that you need to use it? `cmd = ['jq', '-c', r'''.user |= if test("^[Bb]asic ") then .[6:] | @base64d | gsub (":.*$"; "") else . end''']` has fewer moving parts, is easier to reason about, and lets Python get better information about exactly when wrong when there's a failure (with `shell=True` Python only gets an exit status relayed from the copy of `/bin/sh` it starts; with the default `shell=False` Python gets any execve() errno, any child signal data, etc etc). – Charles Duffy Apr 06 '23 at 04:02
  • Passing `cmd` as a single long string _in a list_ is ... just weird. It happens to work for crazy reasons, but betrays a misunderstanding of the relationship between the first argument and `shell=True`. – tripleee Apr 06 '23 at 06:42

2 Answers2

2

Your approach reads until EOF, which is what read() does, leaving no input left for the subprocess to read. Try readline() instead.

Opening a second stdin also seems weird. I'd imagine you want something like

import sys
...
print(sys.stdin.readline(), end='')
subprocess.run(cmd, stdin=sys.stdin, shell=True, check=True)

Tangentially, you can probably get rid of shell=True if you only have a single command you want to run.

import shlex
...
subprocess.run(shlex.split(cmd), stdin=sys.stdin, check=True)

The stdin=sys.stdin is not strictly speaking crucial (the subprocess will inherit the standard input of its parent process, and consume as much or as little of it as it likes to), but it documents your intent.

Weirdly, even with a single readline() I get the same symptoms you describe. I can work around it with an explicit read():

subprocess.run(shlex.split(cmd), input=sys.stdin.read(), text=True, check=True)

... but this is obviously unattractive if you need to handle potentially large amounts of input.

The opposite case does not work; in the general case, the command can read all the available input up to EOF (though some commands, like head, will not) and then there will be nothing left for your Python script to read after the subprocess finishes.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • For the first way I got error: AttributeError: module 'subprocess' has no attribute 'STDIN' – Dargod Apr 05 '23 at 14:37
  • Also, in your example, after the print print(sys.stdin.readline().rstrip('\n')), command subprocess will be ignored. If you swap them, then subprocess is done, and the print is ignored. The same if try to assign a value sys.stdin.readline() to a variable – Dargod Apr 05 '23 at 14:55
  • Sorry, I misremembered the symbols defined in `subprocess`; updated to simply use `sys.stdin` – tripleee Apr 05 '23 at 16:57
  • It does not seem that it helps to pass the pipe to the main command and write the incoming pipe to a variable somewhere separately (it is not necessary to display it on the screen, this is more for debugging). Only what is first is executed, either reading and writing to a variable (display on the screen), or executing a command. I added a question, maybe this way the general goal will be clearer – Dargod Apr 05 '23 at 18:48
  • See also now https://stackoverflow.com/questions/76001787/why-does-reading-from-stdin-prevent-the-subprocess-from-accessing-it – tripleee Apr 13 '23 at 05:04
  • It doesn't seem to solve the problem with clearing the variable after subtraction ```print(unbuffered_stdin.readline())```. It cannot be reused in the future, as it will be empty after first readline(). – Dargod Apr 17 '23 at 10:50
  • @Dargod Do you mean the answer to the linked question? I think it answers the fundamental question quite well. – tripleee Apr 18 '23 at 05:42
-1

You are making this more difficult than necessary. In order to read from a pipe such as echo "some data with spaces" | script.py, you can just read from stdin with any standard input method.

Similarly, to pass data to another process, just write to stdout with any standard output method. The OS will take care of the redirection for you.

For example:

# script.py
for line in input():
    print(line)

Now do

echo "my first line\nmy second line" | python script.py

And you will get this output:

my first line
my second line
Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • 2
    But this doesn't demonstrate how to let a subprocess consume most of the input. – tripleee Apr 05 '23 at 14:15
  • @tripleee Maybe I misunderstand. As I say in the first paragraph, you can just pipe to another process to consume the output of your script just in the same way that you pipe input into your script. – Code-Apprentice Apr 05 '23 at 14:31
  • 2
    @Code-Apprentice, basically: if this is what the OP wants, why do they have a `subprocess` invocation _inside their Python code_ at all? They already _know_ they can use a shell pipe; they show that in the question! So if this were a valid interpretation of their intent, a whole lot of the question text would be completely unnecessary. – Charles Duffy Apr 05 '23 at 17:22
  • @CharlesDuffy The OP clearly knows how to use a pipe in a command line with existing commands, but may not know how to implement their own script that can be used with pipes. I agree this may not be what the OP needs, but I felt it was worth showing a possible way to simplify their solution if it works for them. – Code-Apprentice Apr 05 '23 at 17:38
  • 1
    I need to execute the prescribed command, as well as separately write down what piped to the script into a separate variable for further analysis (I need to catch what data leads to parsing errors in the stream) – Dargod Apr 05 '23 at 18:58