11

I'm writing a Python program that logs terminal interaction (similar to the script program), and I'd like to filter out the VT100 escape sequences before writing to disk. I'd like to use a function like this:

def strip_escapes(buf):
    escape_regex = re.compile(???) # <--- this is what I'm looking for
    return escape_regex.sub('', buf)

What should go in escape_regex?

Lorin Hochstein
  • 57,372
  • 31
  • 105
  • 141
  • It's a bit complicated: http://en.wikipedia.org/wiki/ANSI_escape_sequences – sarnold Oct 22 '11 at 04:25
  • 2
    Check http://www.webdeveloper.com/forum/showthread.php?t=186004 for a PHP version. It should be simple to convert it to python. – Mansour Oct 22 '11 at 05:02
  • 1
    In the spirit of these other comments, here is also a TCL process that does exactly the same thing... http://wiki.tcl.tk/9673 – Niall Byrne Oct 27 '11 at 02:13
  • 1
    Here's one that worked for me: `sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g"` ([source](http://www.commandlinefu.com/commands/view/3584/remove-color-codes-special-characters-with-sed)) – Adam Monsen Mar 21 '13 at 17:27

3 Answers3

5

The combined expression for escape sequences can be something generic like this:

(\x1b\[|\x9b)[^@-_]*[@-_]|\x1b[@-_]

Should be used with re.I

This incorporates:

  1. Two-byte sequences, i.e. \x1b followed by a character in the range of @ until _.
  2. One-byte CSI, i.e. \x9b as opposed to \x1b + "[".

However, this will not work for sequences that define key mappings or otherwise included strings wrapped in quotes.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
3

VT100 codes are already grouped(mostly) according to similar patterns here:

http://ascii-table.com/ansi-escape-sequences-vt-100.php

I think the simplest approach would be to use some tool like regexbuddy to define a regex for each VT100 codes group.

Alex Brooks
  • 5,133
  • 4
  • 21
  • 27
1

I found the following solution to successfully parse vt100 color codes and remove the non-printable escape sequences. The code snippet found here successfully removed all codes for me when running a telnet session using telnetlib:

    def __processReadLine(self, line_p):
    '''
    remove non-printable characters from line <line_p>
    return a printable string.
    '''

    line, i, imax = '', 0, len(line_p)
    while i < imax:
        ac = ord(line_p[i])
        if (32<=ac<127) or ac in (9,10): # printable, \t, \n
            line += line_p[i]
        elif ac == 27:                   # remove coded sequences
            i += 1
            while i<imax and line_p[i].lower() not in 'abcdhsujkm':
                i += 1
        elif ac == 8 or (ac==13 and line and line[-1] == ' '): # backspace or EOL spacing
            if line:
                line = line[:-1]
        i += 1

    return line
Steph
  • 11
  • 2
  • That won't work for some common initialization sequences, such as escape = escape > escape 7 escape 8 as well as any *reset mode* control (ends with "l"). Those are listed in xterm's documentation: http://invisible-island.net/xterm/ctlseqs/ctlseqs.html – Thomas Dickey Mar 11 '15 at 20:36