as the title says, I am trying to figure out how to perform a diacritics-insensitive $regex
search in MongoDB, although at this point I am not sure if that's even possible.
Basically, imagine we have a Teams collection with a documents like these:
{ id: 1, name: "FC Bayern München" },
{ id: 2, name: "Atlético Madrid" }
For this collection, I have created a text
index for the name
field:
db.getCollection('teams').createIndex({name: 'text'});
This allows me to perform a diacritics and case-insensitive search.
db.getCollection('teams').find({ $text: { $search: "bayern" }});
db.getCollection('teams').find({ $text: { $search: "munchen" }});
// ✅ { id: 1, name: "FC Bayern München" }
However, if the text search doesn't include a full word (Bayern
, Munchen
), the query produces no results:
db.getCollection('teams').find({ $text: { $search: "bayer" }});
db.getCollection('teams').find({ $text: { $search: "munc" }});
// ❌ (no results)
So to make this work as intended, I need to use $regex
search instead, however, I can't seem to find a way to ignore diacritics.
db.getCollection('teams').find({ name: { $regex: "baye", $options: 'i' }});
// ✅ { id: 1, name: "FC Bayern München" }
db.getCollection('teams').find({ name: { $regex: "munchen", $options: 'i' }});
// ❌ (no results)
So my question is, is there any way to achieve this universal search that can search both diacritics-insensitively and not having to match whole words, via regular expression or other means?