Getting python string equivalence to work like SQL match

Question

I am trying to match two strings, Serhat Kılıç and serhat kilic. In SQL this is quite easy, as I can do:

select name from main_creditperson where name = 'serhat kılıç'
union all
select name from main_creditperson where name = 'serhat kilic';

===
name
Serhat Kılıç
Serhat Kılıç

In other words, both names return the same result. How would I do a string equivalent in python to see that these two names are 'the same' in the SQL sense. I am looking to do something like:

if name1 == name2:
   do_something()

I tried going the unicodedata.normalize('NFKD', input_str) way but it wasn't getting me anywhere. How would I solve this?

Also, behaviour of the SQL query would be very implementation-dependent. — Antti Haapala -- Слава Україні, Aug 20 '16 at 05:58

score 1 · Accepted Answer · edited May 23 '17 at 11:51

If you're OK with ASCII for everything, you can check Where is Python's "best ASCII for this Unicode" database? Unidecode is rather good, however it is GPL-licensed which might be a problem for some project. Anyway, it would work in your case and in quite a many others, and works on Python 2 and 3 alike (these are from Python 3 so that it is easier to see what's going in):

>>> from unidecode import unidecode
>>> unidecode('serhat kılıç')
'serhat kilic'
>>> unidecode('serhat kilic')
'serhat kilic'
>>> # as a bonus it does much more, like
>>> unidecode('北亰')
'Bei Jing '

Madelyne Velasco Mite · Answer 2 · 2016-08-20T06:06:33.580

0

I found this

def compare_words (str_1, str_2):
    return unidecode(str_1.decode('utf-8')) == str_2

Tested on Python 2.7:

In[2]: from unidecode import unidecode
In[3]: def compare_words (str_1, str_2):
     return unidecode(str_1.decode('utf-8')) == str_2
 In[4]: print compare_words('serhat kılıç', 'serhat kilic')
 True

edited Aug 20 '16 at 06:06

answered Aug 20 '16 at 05:16

Madelyne Velasco Mite

348
1
4
10

I already tried that approach. It doesn't work: `>>> remove_accents(u'serhat kılıç')==remove_accents(u'serhat kilic') False`. Note that I'm not looking to remove accents the "i" character is not an accent. – David542 Aug 20 '16 at 05:21

Getting python string equivalence to work like SQL match

2 Answers2