You are making the basic but common mistake of confusing the representation of a string in Python source code with its actual value.
There are a number of escape codes in Python which do not represent themselves verbatim in regular strings in source code. For example, "\n"
represents a single newline character, even though the Python notation occupies two characters. The backslash is used to introduce this notation. There are a number of dedicated escape codes like \r
, \a
, etc, and a generalized notation \x01
which allows you to write any character code in hex notation (\n
is equivalent to \x0a
, \r
is equivalent to \x0d
, etc). To represent a literal backslash character, you need to escape it with another backslash: "\\"
.
In a "raw string", no backslash escapes are supported; so r"\n"
represents a string containing two characters, a literal backslash \
and a literal lowercase n
. You could equivalently write "\\n"
using non-raw string notation. The r
prefix is not part of the string, it just tells Python how to interpret the string between the following quotes (i.e. no interpretation at all; every character represents itself verbatim).
It is not clear from your question which of these interpretations you actually need, so I will present solutions for both.
Here is a literal string containing actual backslashes:
pat = re.compile(r'\\[xX][0-9a-fA-F]+')
s = r"we are \xaf\x06OK\x03family, good"
print(s)
print(re.sub(pat, '', s))
Here is a string containing control characters and non-ASCII characters, and a regex substitution to remove them:
pat = re.compile(r'[\x00-\x1f\x80-\xff]+')
s = "we are \xaf\x06OK\x03family, good"
print(s)
print(re.sub(pat, '', s))
An additional complication is that the regex engine has its own internal uses for backslashes; we generally prefer to use raw strings for regexes in order to not have Python and the regex engine both interpreting backslashes (sometimes in incompatible ways).