0

I'm trying to compare 2 strings, but == operator fails. they seem to have the same Value if you print them. Even the type is the same: class str, the output of print(repr( )) ist the same, .strip() doesn't helps either and comparing with in operator fails also.

the Strings are "Neues Textdokument.txt - Edito" and the windows window name of the editor

Thanks to the advice from @Random Davis it seems to be a Cyrillic letter in there, which look exactly like the latin letters. but if you check the strings a and b with: print([ord(c) for c in a]) print([ord(c) for c in b]) it shows, the unicode number of the letter in decimal. They seperate in e and M

Blu3bar0n
  • 23
  • 4
  • Maybe try a method that shows the exact differences between the strings? Like this one: https://stackoverflow.com/questions/17904097/python-difference-between-two-strings/17904977 – Random Davis Feb 11 '21 at 16:44
  • Also you can try using `print([ord(c) for c in a])` and `print([ord(c) for c in b])` to see the exact difference between strings `a` and `b`. – Random Davis Feb 11 '21 at 17:02
  • thx for this advice with difflib.ndiff() it shows a change like: - m- е+ m+ e but thats even more confusing – Blu3bar0n Feb 11 '21 at 17:04
  • print([ord(c) for c in a]) and print([ord(c) for c in b]) shows a change from 77 to 1052 and 1077 to 101 – Blu3bar0n Feb 11 '21 at 17:11
  • You should put the full output of those commands in your post so others can help debug. The `- m- е+ m+ e` part makes me think you just have a typo somewhere. – Random Davis Feb 11 '21 at 17:11
  • Okay so `chr(1052)` is the Cyrillic "М" which looks just like "M" but is a different character. `chr(1077)` is the Cyrillic "е" which also looks like "e" but is a different character. So, whichever string has those Cyrillic characters in it is the problem. – Random Davis Feb 11 '21 at 17:16
  • `1077` is decimal, not hex, so it corresponds to unicode 0435: http://www.isthisthingon.org/unicode/index.phtml?glyph=0435. Same with `1052`: http://www.isthisthingon.org/unicode/index.phtml?glyph=041C – Random Davis Feb 11 '21 at 17:19
  • It seems like the Cyrillic characters were converted to ASCII when you pasted your code and output into the question, or they were always being outputted that way. But obviously the actual data does contain those characters. I have no idea which string actually contains those - the output from `EnumWindows` or the string in your code - but it appears you'll have to change the string you're checking to include those Cyrillic characters, assuming they're in the actual window title. I also have no idea why they'd be in there, but that's obviously what's happening. – Random Davis Feb 11 '21 at 17:22
  • It could be that the editor you're using is replacing certain characters with their Cyrillic equivalent as a stylistic choice because they look different/better. Or the characters are being deliberately replaced to make it more difficult to search for the title. That seems less likely though but still possible. Your text file's name might also just have those characters in it for some reason. – Random Davis Feb 11 '21 at 17:26
  • thank u very much, this should be enough advise to solve the problem. i have no clue where this cyrillic letter came from^^ but iwill be able to track it down now – Blu3bar0n Feb 11 '21 at 17:32
  • Okay I'll post the solution as an answer so you can upvote or accept it. – Random Davis Feb 11 '21 at 17:32
  • thx, i cutted the usless part out of the question – Blu3bar0n Feb 11 '21 at 17:35
  • Here's info on how you can compare strings that are almost identical, which might be helpful: https://stackoverflow.com/questions/31642940/finding-if-two-strings-are-almost-similar – Random Davis Feb 11 '21 at 17:59

1 Answers1

0

So it turns out the strings just looked the same, but there were some unexpected unicode Cyrillic characters in the actual data which looked the same as ASCII characters. So, the solution was to run the following code to compare the comparison string with the actual string:

print([ord(c) for c in a])
print([ord(c) for c in b])

This showed that in the actual data, there were the Cyrillic characters "М" and "е", which caused the string comparison to return False.

Random Davis
  • 6,662
  • 4
  • 14
  • 24