0

I'm trying to escape the strings in this sequence

[0m[ERROR] [1585551547.349979]: Failed to create bragfiles/downtimer/fight100/2020-03-27. Error: 550 Create directory operation failed.
[ERROR] [1585551547.349979]: Failed to create bragfiles/downtimer/fight100/2020-03-27. Error: 550 Create directory operation failed.

and

[32m[INFO] [2020-03-29 23:58:50.607198] TaskManager.poll: system has no current task.[0m
[INFO] [2020-03-29 23:58:50.607198] TaskManager.poll: system has no current task.

Plus the occasional double symbol

"[0m[32m[INFO] [2020-03-29 23:58:34.695268] Polling for updates from the server for fight100...[0m"
"[INFO] [2020-03-29 23:58:34.695268] Polling for updates from the server for fight100..."

I've come across this before but it doesn't seem to be correct in my case:

  1. How can I remove the ANSI escape sequences from a string in python
  2. Remove all ANSI colors/styles from strings

I've been trying various variations of \x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]) but I don't think that fits the bill

But none of the regexes I've tried so far seem to be generic enough

cjds
  • 8,268
  • 10
  • 49
  • 84
  • Do you want to remove `[0m` and `[32m`? `text = text.replace('[0m','').replace('[32m','')`? – Wiktor Stribiżew Apr 15 '20 at 15:55
  • More that I want to do it for all color codes in the beginning of lines. Black 0;30 Dark Gray 1;30 Red 0;31 Light Red 1;31 Green 0;32 Light Green 1;32 Brown/Orange 0;33 Yellow 1;33 Blue 0;34 Light Blue 1;34 Purple 0;35 Light Purple 1;35 Cyan 0;36 Light Cyan 1;36 Light Gray 0;37 White 1;37 – cjds Apr 15 '20 at 16:08
  • Is the color escape sequence Always followed by a `'[TEXT]` sequence? Can the string be reliably split so the color escape sequence will be at the **start** of the resultant strings? Is the color sequence either at the start of the string OR preceded by a period? Have you considered making multiple passes? – wwii Apr 15 '20 at 17:33

1 Answers1

1

(One or two (color escape sequences)) followed by (uppercase alpha characters enclosed in square brackets)(positive look ahead)

pat = r'''((\[\d+m){1,2})(?=\[[A-Z]+\])'''

Works with this string:

s = '''[0m[ERROR] [1585551547.349979]: xyz xyz.
[0m[32m[INFO] [2020-03-29 23:58:34.695268] hjk hjk.[0m[32m[INFO] [2020-03-29 23:58:34.695268] foo bar foo'''

The positive lookahead prevents that last bit from being captured.


>>> print(re.sub(pat,'',s))
[ERROR] [1585551547.349979]: xyz xyz.
[INFO] [2020-03-29 23:58:34.695268] hjk hjk.[INFO] [2020-03-29 23:58:34.695268] foo bar foo
>>>

If you need to remove sequences specifying foreground and background colors like

[2m[93m[0m[32m[INFO] [2020-03-29 23:58:34.695268] foo bar foo

use pat = r'''((\[\d+m){1,})(?=\[[A-Z]+\])''' for (one or more) instead of (one or two).


If there is also stuff like this

[0m[1;37m[ERROR] [1585551547.349979]: xyz xyz.
[0m[1;37m[0;32m[ERROR] [1585551547.349979]: xyz xyz.

use pat = r'''(\[([01];)?\d+m){1,}(?=\[[A-Z]+\])'''


Some of your example strings showed color sequences in the middle of the string and you desired output showed them being replaced - contrary to your comment

all color codes in the beginning of lines.

These patterns will remove the sequence from the middle of a string.

wwii
  • 23,232
  • 7
  • 37
  • 77