0

I am trying to match two strings, Serhat Kılıç and serhat kilic. In SQL this is quite easy, as I can do:

select name from main_creditperson where name = 'serhat kılıç'
union all
select name from main_creditperson where name = 'serhat kilic';

===
name
Serhat Kılıç
Serhat Kılıç

In other words, both names return the same result. How would I do a string equivalent in python to see that these two names are 'the same' in the SQL sense. I am looking to do something like:

if name1 == name2:
   do_something()

I tried going the unicodedata.normalize('NFKD', input_str) way but it wasn't getting me anywhere. How would I solve this?

David542
  • 104,438
  • 178
  • 489
  • 842

2 Answers2

1

If you're OK with ASCII for everything, you can check Where is Python's "best ASCII for this Unicode" database? Unidecode is rather good, however it is GPL-licensed which might be a problem for some project. Anyway, it would work in your case and in quite a many others, and works on Python 2 and 3 alike (these are from Python 3 so that it is easier to see what's going in):

>>> from unidecode import unidecode
>>> unidecode('serhat kılıç')
'serhat kilic'
>>> unidecode('serhat kilic')
'serhat kilic'
>>> # as a bonus it does much more, like
>>> unidecode('北亰')
'Bei Jing '
Community
  • 1
  • 1
0

I found this

def compare_words (str_1, str_2):
    return unidecode(str_1.decode('utf-8')) == str_2

Tested on Python 2.7:

In[2]: from unidecode import unidecode
In[3]: def compare_words (str_1, str_2):
     return unidecode(str_1.decode('utf-8')) == str_2
 In[4]: print compare_words('serhat kılıç', 'serhat kilic')
 True
  • I already tried that approach. It doesn't work: `>>> remove_accents(u'serhat kılıç')==remove_accents(u'serhat kilic') False`. Note that I'm not looking to remove accents the "i" character is not an accent. – David542 Aug 20 '16 at 05:21