Suppose I have a table named movie
with a field name
, that is a VARCHAR
field.
I want to implement a search bar in my website, so that when a user enters a string I can query movies with that name.
My first approach was quite naive:
select *
from movie
where name like '%user_string%';
The limitations are:
- Special characters. Say the user string is "Lets go", I want it to return the movie with the name "Let's go" even if the apostrophe is missing.
- Accents. Say the user string is "Pokemon", I want it to return the movie with the name "Pokémon" even if the accent is missing.
My idea was to create an additional normalized_name
field, that is computed using the name
field with all special characters and accents stripped. Then the query would become:
select *
from movie
where normalized_name like '%user_string%';
For instance: user searches for pokemon
, the database query return a movie with the normalized_name
= pokemon
, which real name is Pokémon
. Obviously the user string would be first normalized too - in order to allow searching by the movie real name too.
Now, is this a valid approach? What is the most widely used one - that may also make the search even better? Is there any literature on the matter?