2

Can I use pexpect in a way that ignores ANSI escape codes (especially colors) in the output? I am trying to do this:

expect('foo 3 bar 5')

...but sometimes I get output with ANSI-colored numbers. The problem is I don't know which numbers will have ANSI colors and which won't.

Is there a way to use pexpect but have it ignore ANSI sequences in the response from the child process?

Alex Shroyer
  • 3,499
  • 2
  • 28
  • 54

3 Answers3

2

Here's a not entirely satisfying proposal, subclassing 2 routines of the pexpect classes pexpect.Expecter and pexpect.spawn so that incoming data can have the escape sequences removed before they get added to the buffer and tested for pattern match. It is a lazy implementation in that it assumes any escape sequence will always be read atomically, but coping with split reads is more difficult.

# https://stackoverflow.com/a/59413525/5008284
import re, pexpect
from pexpect.expect import searcher_re

# regex for vt100 from https://stackoverflow.com/a/14693789/5008284  
class MyExpecter(pexpect.Expecter):
    ansi_escape = re.compile(rb'\x1B[@-_][0-?]*[ -/]*[@-~]')

    def new_data(self, data):
        data = self.ansi_escape.sub(b'', data)
        return pexpect.Expecter.new_data(self, data)

class Myspawn(pexpect.spawn):
    def expect_list(self, pattern_list, timeout=-1, searchwindowsize=-1,
                    async=False):
        if timeout == -1:
            timeout = self.timeout
        exp = MyExpecter(self, searcher_re(pattern_list), searchwindowsize)
        return exp.expect_loop(timeout)

This assumes you use the expect() call with a list, and do

child = Myspawn("...")
rc = child.expect(['pat1'])

For some reason I had to use bytes rather than strings as I get the data before it is decoded, but that may just be because of a currently incorrect locale environment.

meuh
  • 11,500
  • 2
  • 29
  • 45
0

This workaround partially defeats the purpose of using pexpect but it satisfies my requirements.

The idea is:

  1. expect anything at all (regex match .*) followed by the next prompt (which in my case is xsh $ - note the backslash in the "prompt" regex)
  2. get the after property
  3. trim off the prompt: [1:]
  4. remove ANSI escape codes from that
  5. compare the filtered text with my "expected" response regex
with pexpect.spawn(XINU_CMD, timeout=3, encoding='utf-8') as c:
    # from https://stackoverflow.com/a/14693789/5008284
    ansi_escape = re.compile(r"\x1B[@-_][0-?]*[ -/]*[@-~]")
    system_prompt_wildcard = r".*xsh \$ "  # backslash because prompt is "xsh $ "

    # tests is {command:str, responses:[str]}
    for test in tests:
        c.sendline(test["cmd"])
        response = c.expect([system_prompt_wildcard, pexpect.EOF, pexpect.TIMEOUT]) #=> (0|1|2)

        if response != 0: # any error
            continue

        response_text = c.after.split('\n')[1:]
        for expected, actual in zip(test['responses'], response_text):
            norm_a = ansi_escape.sub('', norm_input.sub('', actual.strip()))
            result = re.compile(norm_a).findall(expected)
            if not len(result):
                print('NO MATCH FOUND')
Alex Shroyer
  • 3,499
  • 2
  • 28
  • 54
0

Something like:

#see https://web.archive.org/web/20200805075926/http://ascii-table.com/ansi-escape-sequences.php
class AsciiDecoder(object):
    def __init__(self):
        self.buf = b''
    def encode(self, b, final=False):
        return b
    def decode(self, b, final=False):
        # escape sequences can be split so
        # work on lines 
        self.buf = self.buf + b
        i = self.buf.find(b'\n')
        if i >= 0:
            c = self.buf[0:i+1]
            self.buf = self.buf[i+1:]
            d = re.sub(rb'\x1b\[[0-9;=?]*[HfABCDsuJKmhlr]', b'*', c)
            e = re.sub(rb'\x1b', b'<ESC>', d)
            if e != e:
                print(">", e, "<")
            return e
        return b''

child = pexpect.spawn(command=command[0],
                      args=command[1:],
                      logfile=sys.stdout.buffer,
                      echo=False)

# two ways to manipulate the output from command
# wrap child.read_nonblocking()
child._decoder = AsciiDecoder() # used by SpawnBase.read_nonblocking()
  • change b'*' to b'' (it's so I can see it working)
  • the b'' is to catch anything the pattern misses, so drop that as well
cagney
  • 457
  • 3
  • 11