The question has been asked in other programming languages, but how would you perform an accent insensitive regex on Ruby ?
My current code is something like
scope :by_registered_name, ->(regex){
where(:name => /#{Regexp.escape(regex)}/i)
}
I thought maybe I could replace non-alphanumeric+whitespace characters by dots, and remove the escape
, but is there not a better way ? I'm afraid I could catch weird things if I do that...
I am targeting French right now, but if I could also fix it for other languages that would be cool.
I am using Ruby 2.3 if that can help.
I realize my requirements are actually a bit stronger, I also need to catch things like dashes, etc. I am basically importing a school database (URL here, the tag is <nom>
), and I want people to be able to find their schools by typing its name. Both the search query and search request may contain accents, I believe the easiest way would be to make "both" insensitive.
- "Télécom" should be matched by "Telecom"
- "établissement" should be matched by "etablissement"
- "Institut supérieur national de l'artisanat - Chambre de métiers et de l'Artisanat en Moselle" should be matched by "artisanat chambre de métiers
- "Ecole hôtelière d'Avignon (CCI du Vaucluse)" Should be matched by Ecole hoteliere d'avignon" (for the parenthesis it's okay to skip it)
- "Ecole française d'hôtesses" should be matched by "ecole francaise d'hot"
Also crazy stuff I found in that DB, I will consider sanitizing this input I think
- "Académie internationale de management - Hotel & Tourism Management Academy" Should be matched by "Hotel Tourism" (note the & is actually written
&
in the XML)