This a UTF-8 encoding, so the \xc2
comes from here, take a look here.
In a Python string, \x80
means Unicode codepoint #128 (Padding Character). When we encode that codepoint in UTF-8, it takes two bytes.
The original ASCII encoding only had 128 different characters, there are many thousands of Unicode codepoints, and a single byte can only represent 256 different values. A lot of computing is based on ASCII, and we’d like that stuff to keep working, but we need non-English-speakers to be able to use computers too, so we need to be able to represent their characters.
The answer is UTF-8, a scheme that encodes the first 128 Unicode code points (0-127, the ASCII characters) as a single byte – so text that only uses those characters is completely compatible with ASCII. The next 1920 characters, containing the most common non-English characters (U+80 up to U+7FF) are spread across two bytes.
So, in exchange for being slightly less efficient with some characters that could fit in a one-byte encoding (such as \x80), we gain the ability to represent every character from every written language.
For more reading, try this SO question
For example if you want to remove the \xc2
try to encode your string as latin-1
a=chr(128)
print(a)
#'\x80'
print(a.encode())
#b'\xc2\x80'
a.encode('latin-1')
#b'\x80'