4

This question is related to my earlier question Accent insensitive search django sqlite

As mentioned in the response there is no direct way to do so. I have come up with a solution, but I am not sure if it is a good one:

Use Case: Assume that the database has a table NewsArticles with one of the column being ArticleText. As the name implies ArticleText contains the text of the news articles which includes several words with accented characters. Let's say one such word present in the ArticleText for an article with Primary Key aid123 is Puerto Aisén. Now, a user can search for either Puerto Aisén or Puerto Aisen and should be able to get the article with PK aid123 back with the found accented word in bold (<b>Puerto Aisén</b>).

Solution: I add one more column in the table normalizedArticleText and make it contain the unicode.normalize (accent removed) version of the text. Now whenever a search query comes, I first determine if the query contains accented character or not by using s.decode('ascii') and then search accordingly in the corresponding column.

Problem: I am duplicating the whole data. Also, there is no way for me to bold the accented keyword if the search query was the non-accented version of the keyword.

Any brilliant suggestions? I am using django with sqlite

Community
  • 1
  • 1
The Wanderer
  • 3,051
  • 6
  • 29
  • 53

1 Answers1

-1

Try using the unicodedata package. Here's an example for Python 3:

import unicodedata

unicodedata.normalize('NFD', 'répertoire').encode('ascii', 'ignore')

Or, for Python 2.7:

import unicodedata

unicodedata.normalize('NFD', u'répertoire').encode('ascii', 'ignore')

Either of these will output:

'repertoire'

Simply replace répertoire with your string. NFD is a form of normalization. You can read more on the different forms of normalization here:

https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize

Good luck!

FlipperPA
  • 13,607
  • 4
  • 39
  • 71