3

I use colorama to add ANSI codes to text, I need to split the ANSI color codes from the text so the text can be printed in column formats. The following expression separates a single color code from the text, but not a double code.

# adapted from https://stackoverflow.com/questions/2186919
split_ANSI_escape_sequences = re.compile(r"""
    (?P<col>
    \x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    )*
    (?P<text>.*)
    """, re.VERBOSE).fullmatch

def split_ANSI(s):
    return split_ANSI_escape_sequences(s).groupdict()

This is the result:

>>> split_ANSI('\x1b[31m\x1b[1mtext')
{'col': '\x1b[1m', 'text': 'text'}

It splits correctly, but loses the formatting information. I'm expecting

{'col': '\x1b[31m\x1b[1m', 'text': 'text'}

as the result.

How can I get all the potential escape sequences in the first group?

Josh English
  • 512
  • 2
  • 16

1 Answers1

4

I found the answer at Python RegEx multiple groups by asking the question different ways.

The first named group gets overwritten by each match. This version works:

split_ANSI_escape_sequences = re.compile(r"""
    (?P<col>(\x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    )*)
    (?P<name>.*)
    """, re.VERBOSE).match

def split_ANSI(s):
    return split_ANSI_escape_sequences(s).groupdict()
Josh English
  • 512
  • 2
  • 16