python: remove stray bytes from string

Question

I have a string that I scraped online that looks like this:

"trackingId":"f<0x85>©9\u0004+L<0x9b><0x91>\u001A<0x87>&\u0013i+T"},{"pendingInvitation":false

How do I remove the stray bytes <0x85>, <0x9b>, <0x91>, and <0x87> from my string?

You could use a "black list" for all unwanted bytes (`unwanted = (b'<0x85>', ...)`) and a generator expression for filtering: `"".join(b for bs in bytestring if b not in unwanted)` — Niklas Mertsch, Dec 09 '18 at 08:47
Is that literally the string `'<0x85>'`? If I look at the source of your question I see lots of funny characters. Please include the actual string as code (e.g. the output of `repr(your_string)`), not as quoted text. Also, that looks pretty unlikely to be a tracking ID string unless it's binary and you messed up the encoding. — mercator, Dec 09 '18 at 09:48

score 2 · Answer 1 · answered Dec 09 '18 at 08:41

2

You can use regex:

import re

s = '"trackingId":"f<0x85>©9\u0004+L<0x9b><0x91>\u001A<0x87>&\u0013i+T"},{"pendingInvitation":false'
print(s)
print(re.sub(r'<0x\w{2}>', '',s))

with output:

"trackingId":"f<0x85>©9+L<0x9b><0x91><0x87>&i+T"},{"pendingInvitation":false
"trackingId":"f©9+L&i+T"},{"pendingInvitation":false

I have searched for the patten <0x__>, where the __ is any char or digit of length 2.

answered Dec 09 '18 at 08:41

Dinari

2,487
13
28

Yes, but the byte is not actually a string – etayluz Dec 09 '18 at 08:46
I thought you have `<0x9b>` and the such inside your actual string. You could filter out bytes that do not fit the encoding. such as in https://stackoverflow.com/questions/26541968/delete-every-non-utf-8-symbols-from-string , but if that is not it i am not sure i understand your problem. – Dinari Dec 09 '18 at 08:54

python: remove stray bytes from string

1 Answers1