1

I want to e. g. read the first line printed out by "tcpdump":

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

using "ptyprocess" (context: local process, terminal involved) and select() to wait for new data with a timeout:

import logging
from ptyprocess import PtyProcess
from select import select

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(name)s %(message)s")

pty_process = PtyProcess.spawn(
    argv=["sudo", "tcpdump", "-w", "capture.pcap", "-i", "enp0s3"],
    echo=True)
while True:
    rlist, _, _ = select([pty_process.fd], [], [], 1)
    if pty_process.fd in rlist:
        try:
            data = pty_process.read(1)
        except EOFError:
            logging.debug("EOF")
            break
        logging.debug("read: %r", data)
    else:
        logging.debug("timeout")

For Python 3.x (tested with 3.6.10 and 3.8.1) this code reads the above mentioned line printed out by "tcpdump".

For Python 2.x (tested with 2.7.17) this code only reads the first character "t" and after that select() times out. I also observed, that for a first run, more than one character was read, but not all.

Tested on Debian 10.

How can I use select() with a timeout (or something similar) with "ptyprocess" to wait for new data, before I read the next character in Python 2?

Update 1:

strace shows the following difference:

Python 2:

select(6, [5], [], [], {tv_sec=1, tv_usec=0}) = 1 (in [5], left {tv_sec=0, tv_usec=999993})
read(5, "tcpdump: listening on enp0s3, li"..., 8192) = 86

Python 3:

select(6, [5], [], [], {tv_sec=1, tv_usec=0}) = 1 (in [5], left {tv_sec=0, tv_usec=999994})
read(5, "t", 1)                         = 1

I. e. for Python 2, read(..., 8192) is called and for Python 3, read(..., 1). How can I achieve, that for Python 2 also read(..., 1) is called?

Update 2:

The problem is independent from "tcpdump" and can also be reproduced like this:

import logging
from ptyprocess import PtyProcess
from select import select

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(name)s %(message)s")

pty_process = PtyProcess.spawn(
    argv=["bash", "-c", "echo 123 ; sleep 3"],
    echo=True)
while True:
    rlist, _, _ = select([pty_process.fd], [], [], 1)
    if pty_process.fd in rlist:
        try:
            data = pty_process.read(1)
        except EOFError:
            logging.debug("EOF")
            break
        logging.debug("read: %r", data)
    else:
        logging.debug("timeout")

Python 2 output:

2020-04-23 12:51:27,126 root read: '1'
2020-04-23 12:51:28,193 root timeout
2020-04-23 12:51:29,204 root timeout
2020-04-23 12:51:30,129 root read: '2'
2020-04-23 12:51:30,129 root read: '3'
2020-04-23 12:51:30,129 root read: '\r'
2020-04-23 12:51:30,130 root read: '\n'
2020-04-23 12:51:30,130 root EOF

Python 3 output:

2020-04-23 12:51:23,106 root read: b'1'
2020-04-23 12:51:23,107 root read: b'2'
2020-04-23 12:51:23,107 root read: b'3'
2020-04-23 12:51:23,107 root read: b'\r'
2020-04-23 12:51:23,107 root read: b'\n'
2020-04-23 12:51:24,109 root timeout
2020-04-23 12:51:25,109 root timeout
2020-04-23 12:51:26,109 root EOF
  • Does this answer your question? [Handling tcpdump output in python](https://stackoverflow.com/questions/17904231/handling-tcpdump-output-in-python) – Ross Jacobs Apr 23 '20 at 01:06
  • Regardless of whether that question answers this one, using `Popen` with `tcpdump -l` works on both Python versions, and then you can terminate it if it takes longer than a second to get to the next iteration in the loop (see the answer). As with many things in python, there are multiple ways to do things, and I think Popen is preferable in this case. – Ross Jacobs Apr 23 '20 at 01:09
  • 1
    @RossJacobs I want to read character by character with a timeout, e. g. to be able to do something like "read until a certain string occurs with a timeout, continue reading later". In this question I want to focus on "ptyprocess" (i. e. on cases where a terminal is involved, although it might not be needed for tcpdump here) and not on "subprocess", which I use for other cases. – python_user_1234 Apr 23 '20 at 07:13
  • Can you go into more detail in your question as to why ptyprocess is necessary here and the broader environmental context (e.g. this is on an SSH connection and has to be run through a pseudo-terminal)? – Ross Jacobs Apr 23 '20 at 07:19
  • 1
    @RossJacobs in fact this problem occurred during a library development with classes for the following 4 cases: 1) local process, no terminal, 2) local process, terminal, 3) SSH, no terminal, 4) SSH, terminal. Here we are talking about case 2). – python_user_1234 Apr 23 '20 at 07:25

1 Answers1

1

PtyProcess.read() calls self.fileobj.read1(). PtyProcess.fileobj has type BufferedRWPair. BufferedRWPair.read1() delegates to BufferedRWPair.reader.read1(). The constructor of BufferedRWPair creates a BufferedReader object from the parameter reader.

In Python 2.7.16 Modules/_io/bufferedio.c/buffered_read1() calls _bufferedreader_fill_buffer(self), which does:

len = self->buffer_size - start;
n = _bufferedreader_raw_read(self, self->buffer + start, len);

In Python 3.8.1 Modules/_io/bufferedio.c/_io__Buffered_read1_impl() calls:

r = _bufferedreader_raw_read(self, PyBytes_AS_STRING(res), n);

In other words, in Python 3 BufferedReader.read1(n) raw-reads n bytes, whereas in Python 2 it reads more bytes to fill the buffer.

It is not possible to use read(1), which works on the buffer, in combination with select(), which works on the underlying file descriptor, in the way the code posted in the question did.

The following code, which uses pexpect instead of ptyprocess, allows to read with a timeout:

import logging
import pexpect

logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s %(name)s %(message)s")

child = pexpect.spawn("bash -c 'echo 123 ; sleep 3'")
while True:
    try:
        data = child.read_nonblocking(size=1, timeout=1)
        logging.debug("read: %r", data)
    except pexpect.TIMEOUT:
        logging.debug("timeout")
    except pexpect.EOF:
        logging.debug("EOF")
        break

Output:

2020-04-26 14:54:56,006 root read: '1'
2020-04-26 14:54:56,007 root read: '2'
2020-04-26 14:54:56,007 root read: '3'
2020-04-26 14:54:56,007 root read: '\r'
2020-04-26 14:54:56,007 root read: '\n'
2020-04-26 14:54:57,009 root timeout
2020-04-26 14:54:58,010 root timeout
2020-04-26 14:54:59,008 root EOF
Błażej Michalik
  • 4,474
  • 40
  • 55