i am trying to remove non-printable characters from some string variables i have as i am reading in a text file. if i use the below re.sub method it won't work the \x.. chars are not removed
test1 = 'ing record \xac\xd0\x81\xb4\x02\n2018 Apr'
test2 = re.sub('\\\\x(?:\d\d|\w\w|\d\w|\w\d)', '', test1)
but, if i take the value from test1 and place it in the re.sub as a "raw" string then it works perfectly
test2 = re.sub('\\\\x(?:\d\d|\w\w|\d\w|\w\d)', '', r'ing record \xac\xd0\x81\xb4\x02\n2018 Apr')
test2 has 'ing record \n2018 Apr'
i was hoping to easily convert test1 in the first example into a raw string but i'm my searching this doesn't seem easy or possible. looking for a solution that allows me to use re.sub and remove these chars from a str variable , or if there is a way to convert my str variable into a raw string first?
UPDATE FIX: i ended up having to do a lot of conversions to remove the unwanted hex codes but keep my newlines. this works not sure if there is a cleaner method out there.
test33 = 'ing record \xac\xd0\x81\xb4\x02\n2018 Apr'
test44 = re.sub('\\\\x(?:\d\d|\w\w|\d\w|\w\d)', '', test33.encode('unicode-escape').decode("utf-8"))
test66 = test44.encode().decode('unicode-escape')
print(test66)
ing record
2018 Apr