0

I want to compare two strings case-insensitive. In ASCII strings it's okey to use string1.lower() == string2.lower() and it works properly.

But with non-ASCII characters like Turkish İ, it doesn't work as it should. To show this with example,

string1 = 'insanlar'
string2 = "İnsanlar"
print(string1.lower() == string2.lower()) # returns false but they are same words

How to achieve this task with non-ASCII characters?

İbrahim Akar
  • 295
  • 1
  • 3
  • 11
  • Have you tried [`locale.setlocale()`](https://docs.python.org/3/library/locale.html#locale.setlocale)? – Olvin Roght Aug 28 '20 at 06:56
  • 2
    This is hard (the so-called ["Turkey Test"](https://stackoverflow.com/q/796986/3001761)) because it's context-dependent. In some languages, the lowercase of `"I"` is `"i"`, but they're not paired the same way in Turkish text. – jonrsharpe Aug 28 '20 at 07:01
  • Many thanks to deceze for pointing out that comparing `casefold()` outputs doesn't work here (I deleted my answer quickly to avoid showing incorrect information, so can't respond there). – alani Aug 28 '20 at 07:04
  • Similar: [Python UTF-8 Lowercase Turkish Specific Letter](https://stackoverflow.com/q/19030948/3890632) – khelwood Aug 28 '20 at 07:05
  • I guess that the reason *why* casefold doesn't work is that there is some strict lower-case equivalent of `İ` (`i̇`) which is not actually used here because it is easier just to use the ASCII `i`. – alani Aug 28 '20 at 07:08
  • 1
    I don't know is it a reasonable solution but `unidecode(string2.lower()) == unidecode(string1.lower())` works. – İbrahim Akar Aug 28 '20 at 07:20

0 Answers0