How do I obtain the output from a program that uses screen redrawing for use in a terminal screen scraper?

Question

I am trying to obtain the output of a full-screen terminal program that uses redrawing escape codes to present data, and which requires a tty (or pty) to run.

The basic procedure a human would follow is:

Start the program in a terminal.
The program uses redrawing to display and update various fields of data.
The human waits until the display is consistent (possibly using cues such as "it's not flickering" or "it's been 0.5s since the last update").
The human looks at the fields in certain positions and remembers or records the data.
The human exits the program.
The human then performs actions outside the program based on that data.

I would like to automate this process. Steps 4 and 5 can be done in either order. While the perfectionist in me is worried about self-consistency of the screen state, I admit I'm not really sure how to properly define this (except perhaps to use "it's been more than a certain timeout period since the last update").

It seems that using pty and subprocess followed by some sort of screen scraper is one possible way to do this, but I'm unclear on exactly how to use them all together, and what hazards exist with some of the lower level objects I'm using.

Consider this program:

#!/usr/bin/env python2
import os
import pty
import subprocess
import time

import pexpect.ANSI

# Psuedo-terminal FDs
fd_master, fd_slave = pty.openpty()

# Start 'the_program'
the_proc = subprocess.Popen(['the_program'], stdin=fd_master, stdout=fd_slave, stderr=fd_slave)

# Just kill it after a couple of seconds
time.sleep(2)
the_proc.terminate()

# Read output into a buffer
output_buffer = b''
read_size = None

while (read_size is None) or (read_size > 0):
    chunk = os.read(fd_master, 1024)
    output_buffer += chunk
    read_size = len(chunk)

print("output buffer size: {:d}".format(len(output_buffer)))

# Feed output to screen scraper
ansi_term = pexpect.ANSI.ANSI(24, 80)
ansi_term.write(output_buffer)

# Parse presented data...

One problem is that the os.read() call blocks, always. I am also wondering if there's a better way to obtain the pty output for further use. Specifically:

Is there a way to do this (or parts of it) with higher-level code? I can't just use subprocess.PIPE for my Popen call, because then the target program won't work. But can I wrap those file descriptors in something with some more convenient methods to do I/O?
If not, how do I avoid always blocking on the os.read call? I'm more used to file-like objects where read() always returns, and just returns an empty string if the end of the stream is reached. Here, os.read eventually blocks no matter what.
I'm wary of getting this script to "just work" without being aware of potential hazards (eg. race conditions that show up one time in a thousand). What else do I need to be aware of?

I'm also open to the idea that using pty and subprocess in the first place is not the best way to do this.

*"to do this"* -- what is *this*? if you forget about `pty`, `subprocess` then what is the problem that you are trying to solve? btw, `top` is not a simple example. A simple example would be a program that just changes its behavior slightly if it is redirected e.g., it changes its buffering mode or it stops using ANSI escapes (e.g., for colors). Or I can understand if you need `pty` to pass a password outside of normal stdin/stdout. Controlling `top` -- a full-screen program might be a different issue. — jfs, Mar 15 '15 at 13:11
@J.F.Sebastian - what I'm trying to do is obtain the output of a program that uses redrawing escape sequences, so I can observe the data it presents (possibly by feeding it to an ANSI screen scraping utility, or some other way). `top` is the simplest example I could find of a program that does this no matter how it's run (or more specifically, won't run unless it thinks it's under a `tty`). There may be others, but if they change their operation when redirected, they don't illustrate the concept. — detly, Mar 15 '15 at 20:12
@J.F.Sebastian I should point out that I originally just asked how to control a program that uses redrawing codes to display, but it was O/T. This question is an attempt to make that more specific. Hence my reluctance to make it more general. — detly, Mar 15 '15 at 20:20
I've asked because your question looks like [XY problem](http://meta.stackexchange.com/q/66377/137096) to me. [You don't need `pty` to get output from `top` program](http://stackoverflow.com/q/4417962/4279). Let's say you've written your script: how would you define its observed behavior without looking at its source code? — jfs, Mar 15 '15 at 20:43
@J.F.Sebastian I agree, but asking the non-XY version of the question basically reduces to a library recommendation request, which is O/T. The only observable behaviour is that given the output of an in-house program that displays certain data, it will produce the values of certain fields in that output. — detly, Mar 15 '15 at 21:01
I mean, the only part of the process I'm having any difficulty with is the glue code between "program that uses redrawing" and "ANSI terminal screen scraper library." That's all this is. — detly, Mar 15 '15 at 21:14
Do any the solutions from the link I've provided work for you? If not, what do you expect to happen? (*be specific*) What happens instead? — jfs, Mar 15 '15 at 21:15
@J.F.Sebastian No, they do not. Opening the process with `stdout=subprocess.PIPE` will fail in a program specific manner if the program requires a `tty`. `top` itself fails with `top: failed tty get` for example. — detly, Mar 15 '15 at 21:30
While it's true that not every program that uses redrawing requires a `pty` and will happily dump escape sequences to a pipe or change its behaviour, it is a premise of the question that this is not the case. — detly, Mar 15 '15 at 21:37
@J.F.Sebastian BTW, if you can see deleted questions, my original one [was here](http://stackoverflow.com/questions/28977360/how-can-i-control-a-terminal-application-that-uses-screen-redrawing-with-python). — detly, Mar 15 '15 at 21:51
At the very least, `top` works for me (Ubuntu) and the OP from the question I've linked. One more attempt: describe the interaction e.g., start program, read output using pty, terminate program using SIGTERM in 2 seconds. — jfs, Mar 15 '15 at 21:53
@J.F.Sebastian ...that is described, by the code. I have no idea what kind of information you're asking for now. I've edited the question as best I can without making it meaningless. — detly, Mar 15 '15 at 21:56
your code is broken. Use words. Does the description from my comment suit your needs? — jfs, Mar 15 '15 at 21:57
@J.F.Sebastian By the way, I'm curious now. If you run a simplified version of [my `top` example](https://gist.github.com/detly/22284275762bbf9c86d6), do you get the `top: failed tty get` error that I do? — detly, Mar 15 '15 at 22:08
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/73041/discussion-between-detly-and-j-f-sebastian). — detly, Mar 15 '15 at 22:17

score 1 · Answer 1 · edited Mar 16 '15 at 22:44

1

You can use pexpect to do this. Use the run() function to obtain the data, and see the included VT100 emulator (or pyte) for rendering it.

Using the utility top as an example:

import time
import pexpect
import pexpect.ANSI

# Start 'top' and quit after a couple of seconds
output_buffer = pexpect.run('top', timeout=2)

# For continuous reading/interaction, you would need to use the "events"
# arg, threading, or a framework for asynchronous communication.

ansi_term = pexpect.ANSI.ANSI(24, 80)
ansi_term.write(output_buffer)
print(str(ansi_term))

(Note that there is a bug resulting in extra line spacings sometimes.)

edited Mar 16 '15 at 22:44

detly

29,332
18
93
152

answered Mar 15 '15 at 23:20

David K. Hess

16,632
2
49
73

(Post edit) How do I obtain the output from my program to pass to the `write()` method of the `VT100` emulator? – detly Mar 15 '15 at 23:27
Unfortunately, the author admits he doesn't have an example. So, start with this example: https://github.com/pexpect/pexpect/blob/master/examples/uptime.py then add to it allocating the virtual ANSI terminal and writing the incoming data to it. – David K. Hess Mar 15 '15 at 23:30
But that example uses regex matching, which isn't possible in general with redrawing. It also uses `spawn`, which doesn't provide access to the incoming data. – detly Mar 15 '15 at 23:34
The API overview will help you understand more about how pexpect works: https://pexpect.readthedocs.org/en/latest/overview.html – David K. Hess Mar 15 '15 at 23:34
A program that uses redrawing is quite different from one that simply dumps output or has a linear sequence of prompts. – detly Mar 15 '15 at 23:35
You will need to just copy all incoming data to the ANSI class and then use get_region(rs, cs, re, ce) to see what strings were drawn on the virtual terminal's screen buffer. – David K. Hess Mar 15 '15 at 23:36
Right, but "copy all incoming data" is precisely the sticking point here (the word "just" is far from warranted!). That's why it's the focus of the question. How do I do that for a program that wants to run in a `tty`/`pty`? – detly Mar 15 '15 at 23:37
Use `.read` on the spawned process and `.write` on the virtual terminal class. If that doesn't make sense, spend some time learning about what pexpect is and how it works first. – David K. Hess Mar 15 '15 at 23:39
Right! It's the `read()` method from `pexpect.spawn` that I was missing. I can implement the required logic using this. Do you object to me editing your answer with a sample of working code? – detly Mar 15 '15 at 23:45
I and I bet a lot of others would really appreciate seeing a working example. Feel free! – David K. Hess Mar 15 '15 at 23:46
1

Feel free to edit that further if it needs work. I was in a bit of a rush. – detly Mar 15 '15 at 23:53
@detly: the child process may hang if it generates enough output in 2 seconds (it is wrong to call `.wait()` before `.read()`). If you don't care about it, use `ouput_buffer = pexpect.run('top', timeout=2)` that truncates the output instead of hanging forever. – jfs Mar 16 '15 at 18:48
1

@J.F.Sebastian - my only concern was that terminating the process might result in different output to quitting gracefully, but of course you're correct about it hanging. I've changed the code to use `timeout` instead and added a note about more sophisticated reading. – detly Mar 16 '15 at 20:17

score 1 · Accepted Answer · edited May 23 '17 at 11:51

If the program does not generate much output; the simplest way is to use pexpect.run() to get its output via pty:

import pexpect # $ pip install pexpect

output, status = pexpect.run('top', timeout=2, withexitstatus=1)

You could detect whether the output is "settled down" by comparing it with the previous output:

import pexpect # $ pip install pexpect

def every_second(d, last=[None]):
    current = d['child'].before
    if last[0] == current: # "settled down"
        raise pexpect.TIMEOUT(None) # exit run
    last[0] = current

output, status =  pexpect.run('top', timeout=1, withexitstatus=1,
                              events={pexpect.TIMEOUT: every_second})

You could use a regex that matches a recurrent pattern in the output instead of the timeout. The intent is to determine when the output is "settled down".

Here's for comparison the code that uses subprocess and pty modules directly:

#!/usr/bin/env python
"""Start process; wait 2 seconds; kill the process; print all process output."""
import errno
import os
import pty
import select
from subprocess import Popen, STDOUT
try:
    from time import monotonic as timer
except ImportError:
    from time import time as timer

output = []
master_fd, slave_fd = pty.openpty() #XXX add cleanup on exception
p = Popen(["top"], stdin=slave_fd, stdout=slave_fd, stderr=STDOUT,
          close_fds=True)
os.close(slave_fd)
endtime = timer() + 2 # stop in 2 seconds
while True:
    delay = endtime - timer()
    if delay <= 0: # timeout
        break
    if select.select([master_fd], [], [], delay)[0]:
        try:
            data = os.read(master_fd, 1024)
        except OSError as e: #NOTE: no need for IOError here
            if e.errno != errno.EIO:
                raise
            break # EIO means EOF on some systems
        else:
            if not data: # EOF
                break
            output.append(data)
os.close(master_fd)
p.terminate()
returncode = p.wait()
print([returncode, b''.join(output)])

Note:

all three standard streams in the child process use slave_fd unlike the code in your answer that uses master_fd for stdin
the code reads output while the process is still running. It allows to accept a large output (more than a size of a single buffer in kernel)
the code does not loose data on EIO error (means EOF here)

Based on Python subprocess readlines() hangs.

What does the `select` call gain? Won't that block for as long as `os.read()` would have? — detly, Mar 16 '15 at 20:24
@detly: no, the whole point is that `select` won't wait more than `delay` (+OS process scheduling). `os.read()` may return less than `1024` bytes but it shouldn't block immediately after `select` here. — jfs, Mar 16 '15 at 20:35
Fantastic answer by the way. It's good to see what's being abstracted, and I had not noticed the `events` parameter for `pexpect.run()`, which obviates the need for a framework like Twisted. — detly, Mar 16 '15 at 22:43

How do I obtain the output from a program that uses screen redrawing for use in a terminal screen scraper?

2 Answers2

Linked