0

In python 3.8.5 I try to convert some bytes to string and then string to bytes:

>>> a=chr(128)
>>> a
'\x80'
>>> type(a)
<class 'str'>

But when I try to do back convertation:

>>> a.encode()
b'\xc2\x80'       

What is \xc2 bytes? Why it appears? Thanks for any responce!

1 Answers1

0

This a UTF-8 encoding, so the \xc2 comes from here, take a look here.

In a Python string, \x80 means Unicode codepoint #128 (Padding Character). When we encode that codepoint in UTF-8, it takes two bytes.

The original ASCII encoding only had 128 different characters, there are many thousands of Unicode codepoints, and a single byte can only represent 256 different values. A lot of computing is based on ASCII, and we’d like that stuff to keep working, but we need non-English-speakers to be able to use computers too, so we need to be able to represent their characters.

The answer is UTF-8, a scheme that encodes the first 128 Unicode code points (0-127, the ASCII characters) as a single byte – so text that only uses those characters is completely compatible with ASCII. The next 1920 characters, containing the most common non-English characters (U+80 up to U+7FF) are spread across two bytes.

So, in exchange for being slightly less efficient with some characters that could fit in a one-byte encoding (such as \x80), we gain the ability to represent every character from every written language.

For more reading, try this SO question

For example if you want to remove the \xc2 try to encode your string as latin-1

a=chr(128)
print(a)

#'\x80'

print(a.encode())

#b'\xc2\x80'

a.encode('latin-1')

#b'\x80'
Carlo Zanocco
  • 1,967
  • 4
  • 18
  • 31
  • Hi @EvgeniyNekrasov if this or any answer has solved your question please consider accepting it by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this, also take a look to [this](https://meta.stackexchange.com/a/5235/315993) for more details. – Carlo Zanocco Mar 15 '21 at 08:26