0

I am implementing a search feature for user names. Some names have accented characters, but I want to be able to search for them with the nearest ascii character approximation. For example: Vû Trån would be searchable with Vu Tran.

I found a Python library, called unidecode to handle this conversion. It works as expected and takes my unicode string Vû Trån and returns Vu Tran. Perfect.

The issue arises when I start querying my database – I use SQLAlchemy and Postgres.

Here's my Python query:

Person.query.filter(Person.ascii_name.ilike("%{q}%".format(q=query))).limit(25).all()

ascii_name is the getter for my name column, implemented as such

class Person(Base, PersonUtil):

    """
       My abbreviated Person class
    """     

     __tablename__ = 'person'

    id = Column(BigInteger, ForeignKey('user.id'), primary_key=True)
    first_name = Column(Unicode, nullable=False)
    last_name = Column(Unicode, nullable=False)

    name = column_property(first_name + " " + last_name)
    ascii_name = synonym('name', descriptor=property(fget=PersonUtil._get_ascii_name))

class PersonUtil(object):
    def _get_ascii_name(self):
        return unidecode(unicode(self.name))

My intent behind this code is that because I store the unicode version of the first and last names in my database, I need to have a way to call unidecode(unicode(name)) when I retrieve the person's name. Hence, I use the descriptor=property(fget=...) so that whenever I call Person.ascii_name, I retrieve the "unidecoded" name attribute. That way, I can simply write Person.ascii_name.ilike("%{my_query}%")... and match the nearest ascii_name to the search query, which is also just ascii characters.

This doesn't fully work. The ilike method with ascii_name works when I do not have any converted characters in the query. For example, the ilike query will work for the name "Bob Smith", but it will not work for "Bøb Smíth". It fails when it encounters the first converted character, which in the case of "Bøb Smíth" is the letter "ø".

I am not sure why this is happening. The ascii_name getter returns my expected string of "Bob Smith" or "Vu Tran", but when coupled with the ilike method, it doesn't work.

  1. Why is this happening? I've not been able to find anything about this issue.
  2. How can I either fix my existing code to make this work, or is there a better way to do this that will work? I would prefer not to have to change my DB schema.

Thank you.

Friendly King
  • 2,396
  • 1
  • 23
  • 40
  • There is something similar in PostgreSQL itself: http://stackoverflow.com/questions/11005036/does-postgresql-support-accent-insensitive-collations – pozs Nov 11 '14 at 09:37

1 Answers1

0

What you want to do simply won't work because ilike only works on real columns in the database. The column_property and synonym are just syntactic sugars provided by sqlalchemy to help with making the front end easy. If you want to leverage the backend to query with LIKE in the way you intended you need the actual values there. I am afraid you have to generate/store the ascii full name into the database which means you need to change your schema to include ascii_name as a real column, and make sure they are inserted. To verify this yourself, you should dump out the data in the table, and see if your manually constructed queries can work.

metatoaster
  • 17,419
  • 5
  • 55
  • 66
  • Thanks for your answer. Changing the db schema isn't a big deal, but it would've been nice to have ability to do this without the schema modification. Seems there is not other option. – Friendly King Nov 11 '14 at 00:16