0

using $regex in mongodb, I want to find the name B&B Hôtel which contain some special characters like & and ô by typing BB Hotel.

I tried this code:

db.txt.find({ "name": {'$regex': query, $options:'i'}})

where query can be BB Hotel.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
nassim
  • 31
  • 1
  • 5

1 Answers1

1

You don't want regex search, you want diacritic insensitive text search

"name":{
  $text:
    {
      $search: "\"B&B Hotel\""
      $caseSensitive: false,
      $diacriticSensitive: false
    }
}

Note that $diacriticSensitive defaults to false, but I never trust the defaults. If you are running with older indexes (version 2 or less text index), you may not be able to use the indexes. The escaped " in the search part is to search for this phrase.

Tezra
  • 8,463
  • 3
  • 31
  • 68
  • It is a nice one, I could not find it. Do you know how exactly this works: does it apply the regular Unicode normalization (like with [NFC/NFD forms](http://stackoverflow.com/questions/16467479/normalizing-unicode)) or does it only deal with combining marks? I could not find these details [here](https://docs.mongodb.com/manual/reference/operator/query/text/#text-operator-diacritic-sensitivity). – Wiktor Stribiżew May 12 '17 at 17:12
  • @WiktorStribiżew By definition it works with unicode. The ^ above ô is a diacritic (and any letter not A-z is unicode). A diacritic is any mark on/above/below a character that changes the way it is meant to be pronounced. So your question, as you stated it, is how to do diacritic insensitive text search. – Tezra May 12 '17 at 17:17
  • @WiktorStribiżew Your other option is before plugging in "Hotel", do a "Hotel".replaceAll("[oô]","[oô]")... – Tezra May 12 '17 at 17:29