0

I am trying to remove ANSI escape sequences from a string.

I have tried all solutions proposed in this post but none of them worked, thus I concluded that my case is a bit different.

I have the following code that should have replaced all ANSI escape sequences:

print("ascii: " + ascii(string))
x = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])').sub('', string)
y = re.compile(br'(?:\x1B[@-Z\\-_]|[\x80-\x9A\x9C-\x9F]|(?:\x1B\[|\x9B)[0-?]*[ -/]*[@-~])').sub(b'', string.encode("utf-8"))
print("not escaped X: " + ascii(x))
print("not escaped Y: " + ascii(y))

however, I got the following output:

ascii: '\x1b[m>....\x1b[?1h\x1b=\x1b[?2004h>....\r\x1b[K\x1b[32m[03:33:57] blabla'
not escaped X: '>....\x1b=>....\r[03:33:57] blabla'
not escaped Y: b'>....\x1b=>....\r[03:33:57] blabla'

How can I replace all the ANSI escape sequences so the expected result would be: [03:33:57] blabla?

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Programer Beginner
  • 1,377
  • 6
  • 21
  • 47
  • You got the terminology right the first time - I fixed the other attempts. "escape" means to put *more* text (typically backslashes) into a string so that part of its content will be interpreted literally by another process; and there is no such thing as an "ANSI character" because ANSI isn't a *character set*, but a set of rules for giving *meaning to* characters (i.e., so they can be interpreted by another process, such as the terminal). – Karl Knechtel Apr 01 '23 at 03:55
  • As for the question: the code is doing exactly what it is advertised to do. In the example input you show (imply by showing its ascii representation), the `>....` is **not part of** an ANSI escape sequence, and the ESC character `\x1b` would not be followed by `=` in a valid escape sequence. To remove the text you want to remove, you first need a rule that actually describes it. – Karl Knechtel Apr 01 '23 at 03:57
  • Oh I see, so its not working because `>....\x1b=>....\r` is not ANSI? Do you recognize what it is? Because it does not print out anything. – Programer Beginner Apr 01 '23 at 10:55
  • It doesn't look familiar, no. It is still using `\x1b`, so it might have been intended as some kind of extension to ANSI. I would try to get more information about the data source, and consider using another regex (or expanding the existing one) to clean up this part, regardless of what it's properly called. – Karl Knechtel Apr 01 '23 at 19:07

1 Answers1

0

So the following code does correctly remove ANSI escape sequence:

re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])').sub('', string)

The reason there is still >....\x1b=>....\r left before the string is because those are not ANSI escape sequence but series of non-printable character.

Thus, to solve the issue, I simply should have just do .lstrip(">....\x1b=>....\r")

Programer Beginner
  • 1,377
  • 6
  • 21
  • 47