Catch universal newlines but preserve original

Question

So this is my problem,

I'm trying to do a simple program that runs another process using Python's subprocess module, and I want to catch real-time output of the process.

I know this can be done as such:

proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for line in iter(proc.stdout.readline, ""):
    line = line.rstrip()
    if line != "":
        print(line)

The issue is, the process might generate output with a carriage return \r, and I want to simulate that behavior in my program.

If I use the universal_newlines flag in Popen, then I could catch the output that is generated with a carriage return, but I wouldn't know it was as such, and I could only print it "regularly" with a newline. I want to avoid that, as this could be a lot of output.

My question is basically if I could catch the \r output like it is a \n but differentiate it from actual \n output

EDIT

Here is some simplified code of what I tried:

File download.py:

import subprocess

try:
    subprocess.check_call(
        [
            "aws",
            "s3",
            "cp",
            "S3_LINK",
            "TARGET",
        ]
    )

except subprocess.CalledProcessError as err:
    print(err)
    raise SystemExit(1)

File process_runner.py:

import os
import sys

import subprocess

proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)

for char in iter(lambda: proc.stdout.read(1), ""):
    sys.stdout.write(char)

The code in download uses aws s3 cp, which gives carriage returns of the download progress. I want to simulate this behavior of output in my program process_runner which receives download's output.

At first I tried to iter readline instead of read(1). That did not work due to the CR being overlooked.

IMO, if you want to treat newlines as interesting data, then you shouldn't be using line-based input. You should be reading and buffering blocks of data, and then scanning for and processing `\r` and `\n` characters yourself. It's quite possible that you can get what you want using `readline()`, but I think that's going to over-complicate your solution. — CryptoFool, Jun 23 '19 at 06:24
@JohnHennig the question you linked to is exactly what I wrote I'm doing right now. The problem is that indeed, I don't want universal newlines. I want my program to be able to use carriage returns like the program that I run does. The problem is that `readline` reads until it encounters a newline character — Zionsof, Jun 23 '19 at 14:27
@Steve that's what I tried to do, reading character by character. It ran okay when running from PyCharm, but for some reason it did not work correctly while running from the terminal. Couldn't figure out what the problem was though — Zionsof, Jun 23 '19 at 14:28
@Zionsof, if you get different behavior in PyCharm and the terminal, that's a distinct problem that I would address or at least want to understand if I were you. Console input is one of the few areas where a difference in behavior might have to be tolerated. Maybe you can elaborate here with some code and we can help you with that. — CryptoFool, Jun 23 '19 at 17:46
@JohnHennig their solution did not work for me really. I also don't want to capture stderr that way. What they did does work in real time, but not with carriage returns. As I said in a previous comment, I tried Nadia's solution reading character by character and it worked partly fine. — Zionsof, Jun 24 '19 at 05:31
@Steve I will address that of course. I'll update my question with some test code that I tried — Zionsof, Jun 24 '19 at 05:31
Are you concerned that the subprocess will generate a CR that is *not* immediately followed by an LF? That’s very rare in modern text (since OS X). — Davis Herring, Jun 24 '19 at 06:19
@Davis I guess there is a progress indicator on stderr which updates by overwriting the previous line by printing a carriage return before the updated status. There must be no line feed in the output in order for this to work. — tripleee, Jun 24 '19 at 06:37
@tripleee: Very good point—but then you don’t want any kind of translation for them, of course. — Davis Herring, Jun 24 '19 at 12:56
@tripleee never thought the progress indicator could be writing to stderr instead of stdout... I'll check it out — Zionsof, Jun 24 '19 at 14:20

score 3 · Accepted Answer · answered Jun 24 '19 at 15:13

3

A possible way is to use the binary interface of Popen by specifying neither encoding nor error and of course not universal_newline. And then, we can use a TextIOWrapper around the binary stream, with newline=''. Because the documentation for TextIOWrapper says:

... if newline is None... If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated

(which is conformant with PEP 3116)

You original code could be changed to:

proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
out = io.TextIOWrapper(proc.stdout, newline='')

for line in out:
    # line is delimited with the universal newline convention and actually contains
    #  the original end of line, be it a raw \r, \n of the pair \r\n
    ...

answered Jun 24 '19 at 15:13

Serge Ballesta

143,923
11
122
252

Not compatible with Python2.7, so I had to adjust your answer: `out = io.open(proc.stdout.fileno(), mode='r', encoding="utf-8", newline='')` but it works out quite good. So maybe just update it in the answer – Zionsof Jun 25 '19 at 06:29
@Zionsof: I now seldom use Python 2.7, and as your code contains a `print(err)` (with parentheses), I assumed you were using Python 3. Beware if using Python2 because a `TextIOWrapper` will give you a unicode stream which may or not be desirable. – Serge Ballesta Jun 25 '19 at 06:35
You're right, just thought I'd note that it doesn't work with 2.7 like that, needs the alteration as per my comment – Zionsof Jun 25 '19 at 06:53

Catch universal newlines but preserve original

1 Answers1

Linked