Regular expression for separating ANSI escape characters from text

Question

I use colorama to add ANSI codes to text, I need to split the ANSI color codes from the text so the text can be printed in column formats. The following expression separates a single color code from the text, but not a double code.

# adapted from https://stackoverflow.com/questions/2186919
split_ANSI_escape_sequences = re.compile(r"""
    (?P<col>
    \x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    )*
    (?P<text>.*)
    """, re.VERBOSE).fullmatch

def split_ANSI(s):
    return split_ANSI_escape_sequences(s).groupdict()

This is the result:

>>> split_ANSI('\x1b[31m\x1b[1mtext')
{'col': '\x1b[1m', 'text': 'text'}

It splits correctly, but loses the formatting information. I'm expecting

{'col': '\x1b[31m\x1b[1m', 'text': 'text'}

as the result.

How can I get all the potential escape sequences in the first group?

Should that lone `*` not be inside the first named group? Without the named group expression, it seems to repeat just that last `A-Za-z`. — Jongware, Jan 17 '17 at 23:53
@RadLexus, I wasn't sure what you were saying, but I did find the answer. The lone * was overwriting the named group, so it needs another level of grouping. — Josh English, Jan 18 '17 at 00:28

score 4 · Accepted Answer · answered Jan 18 '17 at 00:12

I found the answer at Python RegEx multiple groups by asking the question different ways.

The first named group gets overwritten by each match. This version works:

split_ANSI_escape_sequences = re.compile(r"""
    (?P<col>(\x1b     # literal ESC
    \[       # literal [
    [;\d]*   # zero or more digits or semicolons
    [A-Za-z] # a letter
    )*)
    (?P<name>.*)
    """, re.VERBOSE).match

def split_ANSI(s):
    return split_ANSI_escape_sequences(s).groupdict()

Regular expression for separating ANSI escape characters from text

1 Answers1

Linked