0

I am using python3 to receive and process text messages from a telegram channel. I sometimes get messages containing a string like this:

Ехchanges: Віnance Futures

Looking pretty normal. But when I want to check

if 'Exchanges' in the_string:

I get

False

Trying to track this down:

the_string.encode()

yields

b'\xd0\x95\xd1\x85changes: \xd0\x92\xd1\x96nance Futures'

How can I convert this to a usual string?

'Exchanges: Binance Futures'
Tom Atix
  • 381
  • 1
  • 21
  • 2
    In your example, it looks like the first character is `U+0415 Cyrillic Capital Letter Ie`. It looks identical to the ASCII character `E`, but the visual similarity is a red herring, and you shouldn't expect Python to treat the characters as equal to each other just because they look they same. – water_ghosts Mar 20 '21 at 21:23
  • 1
    Does this answer your question? [Translate Unicode to ascii (if possible)](https://stackoverflow.com/questions/43367355/translate-unicode-to-ascii-if-possible) or [Where is Python's “best ASCII for this Unicode” database?](https://stackoverflow.com/q/816285/4518341) – wjandrea Mar 20 '21 at 21:34
  • @water_ghosts this makes sense. I will use the not - russian string then for the if condition. You can add this as an answer, I will mark it as solved – Tom Atix Mar 20 '21 at 21:37
  • BTW, instead of using encoding for that analysis, you could use `ascii()`, which shows characters instead of bytes: `print(ascii(the_string))` -> `'\u0415\u0445changes: \u0412\u0456nance Futures'` – wjandrea Mar 20 '21 at 21:42

2 Answers2

-1

Try to use encode() and decode() methods of the str class mixed together:

>>> my_string = 'Ехchanges: Віnance Futures'
>>> 'Ехchanges' in my_string
True
>>> my_string.encode()
b'\xd0\x95\xd1\x85changes: \xd0\x92\xd1\x96nance Futures'
>>> 'Ехchanges' in my_string.encode().decode()
True
>>> 
Funpy97
  • 282
  • 2
  • 9
  • Doesn't work. Ехchanges: Віnance Futures this is the original string. I just wrote it as normal in the above example. The bytes representation is the correct one though. If I do encode and then decode, I get a correct looking string but still a False on the if condition. – Tom Atix Mar 20 '21 at 21:32
  • `'Ехchanges' in my_string` -> `True`??? You missed the whole point of the question. – wjandrea Mar 20 '21 at 21:43
-1

It's utf-8 encoded string. You need to use string decoder decode('utf-8') here.

Solution:

encoded_string = b'\xd0\x95\xd1\x85changes: \xd0\x92\xd1\x96nance Futures'
decoded_string = encoded_string.decode("utf-8")
print(decoded_string)