How to replace `c2a0` with none character in python3?

Question

I want to convert b'\xc2\xa0\x38' into b'x38' in python3.

b'\xc2\xa0\x38'.replace(u'\xc2\xa0',"")
b'\xc2\xa0\x38'.replace(u'\xc2a0',"")

TypeError: a bytes-like object is required, not 'str'

In the webpage,the c2 a0 means NO-BREAK SPACE whose unicode point is U+00A0 .

Unicode  code point character   UTF-8  (hex.)   name
U+00A0                          c2 a0           NO-BREAK SPACE

Notice: c2a0 is unprintable , character column is blank here.

relationship on unicode point,character,utf-8

How to convert b'\xc2\xa0\x38' into b'\x38' with replace method?

By passing bytes-like objects instead of strings? – Stop harming Monica Aug 02 '18 at 12:40 — Stop harming Monica, Aug 02 '18 at 12:40

score 9 · Accepted Answer · answered Aug 05 '18 at 00:34

9

You were already almost there:

b'\xc2\xa0\x38'.replace(b'\xc2\xa0',b'')

answered Aug 05 '18 at 00:34

J. Katzwinkel

1,923
16
22

1

How to make `b'8'` displayed as `b'\x38'`? – Aug 05 '18 at 00:46
`print(''.join([r'\x'+hex(b)[2:] for b in b'\xc2\xa0\x38'.replace(b'\xc2\xa0',b'')]))` – J. Katzwinkel Aug 05 '18 at 00:57

score 4 · Answer 2 · answered Aug 05 '18 at 17:00

b'\xc2\xa0\x38'.replace(u'\xc2\xa0',"")
b'\xc2\xa0\x38'.replace(u'\xc2a0',"")

Since b'\xc2\xa0\x38' is a bytes object, you cannot use string methods on it. So when you call .replace() on it, you are not calling str.replace but bytes.replace. While those two look and behave very similarly, they still operate on different types:

str.replace replaces a substring inside of a string with another string. And bytes.replace replaces a sub-bytestring inside of a bytestring with another bytestring. So the types of all arguments always match:

str.replace(str, str)
bytes.replace(bytes, bytes)

So in order to replace something inside of a bytes string, you need to pass bytes objects:

>>> b'\xc2\xa0\x38'.replace(b'\xc2\xa0', b'')
b'8'
>>> b'\xc2\xa0\x38'.replace(b'\xc2a0', b'')
b'\xc2\xa08'

How to make b'8' displayed as b'\x38'?

You generall cannot do that. b'8' and b'\x38' are both equal to another:

>>> b'8' == b'\x38'
True

Both contain the same single byte value, a 0x38. It’s just that there are multiple ways to represent that content as a bytes literal in Python. Just like you can write 10, 0xA, 0b1010 or 0o12 to refer to the same int object with the decimal value of 10, you can describe a bytes object in multiple ways.

Now, when you use the interactive Python REPL, when you just write b'\x38', then Python will interpret that bytes literal, create a bytes object with the single byte 0x38, and then the REPL will print out the repr() of that bytes object. And the repr() of bytes objects just happen to attempt to use ASCII letters whenever possible.

There is no way to change this, but there’s also no need to change that. The b'8' that you see is just one representation of the same bytes object. And if you use that object and do something with it (e.g. write it to a file, transform it, or send over the network), then it’s the actual bytes that are sent, and not some string representation of the bytes object.

If you however want to actually print the bytes object, you can deliberately convert it into a string using your favorite representation. For example, if you want a hex representation of your bytes string, you could use one of the many ways to do that:

>>> print(b'8'.hex())
38
>>> print(b'\x38'.hex())
38

score 0 · Answer 3 · edited Aug 03 '18 at 00:38

0

Is that data being read from a file? Maybe you opened the file in binary mode:

with open(fname, 'rb') as f:

This means that the data read from the file is returned as bytes object, not str.

If that is so, try to open the file as a textfile instead by replacing the 'rb' mode with 'r'.

edited Aug 03 '18 at 00:38

Khanal

788
6
14

answered Aug 02 '18 at 12:32

Agustín Clemente

1,178
3
13
23

How to replace `c2a0` with none character in python3?

3 Answers3