-1

The problem is the following.

'Β'=='B'

Out[104]: False

To make things clear the first is a Greek 'Β' and the second a Latin 'B'.

For sure python is correct to give False as output but for the purpose of the script I'm working I need such characters to count as the same. Tried several encoding /decoding manipulations but still count as different. Any Ideas?

  • how did ```python``` gave you the result ```False``` can you edit your question to say how did you run this line. – KMG Oct 06 '20 at 23:23
  • it is just typed as you see it. The fist ''B'' is typed with English in my keyboard and then switch to Greek keyboard for the second ''B'' – Poulos Spyros Oct 06 '20 at 23:26
  • Are you only checking letters, or do you need to literally translate words and check for a match? – PacketLoss Oct 06 '20 at 23:30
  • ok then you will have to include extra logic in your program since these two have different unicode values which you can't change instead you can use if statements to check in this case. – KMG Oct 06 '20 at 23:30
  • I try to compare vehicle numbers. The Greek ones exist in a dataframe column. The other side come from selenium reading an html table. Those are brought from selenium with Latin characters. Can I change the way selenium reads the table? – Poulos Spyros Oct 07 '20 at 01:08

1 Answers1

0

Following this other answer,

data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")

This will make you loose data as you are going from a 8-bit encoding to a 7-bit (as stated by a comment from the very same answer I am citing), and might work for your problem.

Good luck!

Felipe Whitaker
  • 470
  • 3
  • 9