How to search for accented characters in mongodb collection using nodejs

Question

MongoDB treats É and E as two separate things, so when I search for E it will not find É.

Is there a way to make MongoDB think of them as the same thing?

I am running

var find =Users.find();
var re = new RegExp(name, 'i');
find.where('info.name').equals(re);

How do I match for strings containing accented characters and get the result?

score 1 · Accepted Answer · edited May 23 '17 at 11:43

This feature is not supported in mongodb and i doubt if it will be in the near future. What you could do to overcome is store a different field in each document containing the simple form of each name, in lowercase.

{
  info:{"name":"Éva","search":"eva"};
}

{
  info:{"name":"Eva","Search":"eva"}
}

When you have your document structure this, you have a some advantages,

You could create an index over the field search,

db.user.ensureIndex({"Search":1})

and fire a simple query, to find the match. When you search for a particular term, convert that term to its simple form, and to lower case and then do a find.

User.find({"Search":"eva"});

This would make use of the index as well, which a regex query would not.

See Also: Mongodb match accented characters as underlying character

But if you would want to do it the hard way, which is not recommended. Just for the records i am posting it here,

You need to have a mapping between the simple alphabets and their possible accented forms. For example:

var map = {"A":"[AÀÁÂÃÄÅ]"};

Say the search term is a, but the database document has its accented form, then, you would need to build a dynamic regex yourself before passing it to the find(), query.

var searchTerm = "a".toUpperCase();
var term = [];
for(var i=0;i<searchTerm.length;i++){
    var char = searchTerm.charAt(i);
    var reg = map[char];
    term.push(reg);
}

var regexp = new RegExp(term.join(""));

User.find({"info.name":{$regex:regexp}})

Note, that the depicted example can handle a search term of length > 1 too.

Thanks for your response. Can you suggest how to do using regex for a quick fix? — , Jan 13 '15 at 07:50
@dreamhigh, I have updated my answer. But that is not the way you need to handle the situation. you need to restructure and re-index your documents for better performance. You should follow the first approach. — BatScream, Jan 13 '15 at 08:10

How to search for accented characters in mongodb collection using nodejs

1 Answers1