3

I am looking for string "JESÚS" but only returns the document with the specified string, I need the search to ignore the accents and capital letters.

I am using C# and mongodb driver.

I have two documents saved in my mongodb:

_id:5d265f3129ea36365c7ca587
TRABAJADOR:"JESUS HERNANDEZ DIAZ"

_id:5d265f01db86a83148404711
TRABAJADOR:"JESÚS HERNÁNDEZ DÍAZ"

In visual c# with mongo driver:

var filter = Builders<BsonDocument>.Filter.Regex("TRABAJADOR", new BsonRegularExpression(string.Format(".*{0}.*", "JESÚS"), "i"));

var result = collection.Find(filter, new FindOptions() { Collation = new Collation("es", strength: CollationStrength.Primary, caseLevel:true) }).ToList();

output = JsonConvert.SerializeObject(result);
return output;

If I search for "JESÚS", actual output:

_id:5d265f01db86a83148404711
TRABAJADOR:"JESÚS HERNÁNDEZ DÍAZ"

But actually I am expecting following output:

_id:5d265f3129ea36365c7ca587
TRABAJADOR:"JESUS HERNANDEZ DIAZ"

_id:5d265f01db86a83148404711
TRABAJADOR:"JESÚS HERNÁNDEZ DÍAZ"
  • You are expecting two results one from WORKER and one from TRABAJADOR, but your filter is looking only in TRABAJADOR... – Sinan Jul 12 '19 at 03:01

2 Answers2

1

You need to look at two fields to get both:

 var filter = Builders<BsonDocument>.Filter;
 var query = filter.Regex("TRABAJADOR", new BsonRegularExpression(string.Format(".*{0}.*", "JESÚS"), "i")) & filter.Regex("WORKER", new BsonRegularExpression(string.Format(".*{0}.*", "JESÚS"), "i"));

Replace your first line with these two and give query to your find.

I didn't test it, I hope it works for you!

Sinan
  • 898
  • 1
  • 9
  • 23
1

i recommend you create a text index with the default language set to "none" in order to make it diacritic insensitive and then doing a $text search as follows:

db.Project.createIndex(
    {
        "WORKER": "text",
        "TRABAJADOR": "text"
    },
    {
        "background": false,
        "default_language": "none"
    }
)
db.Project.find({
    "$text": {
        "$search": "jesus",
        "$caseSensitive": false
    }
})

here's the c# code that generated the above queries. i'm using my library MongoDB.Entities for brevity.

using MongoDB.Entities;
using System;
using System.Linq;

namespace StackOverflow
{
    public class Program
    {
        public class Project : Entity
        {
            public string WORKER { get; set; }
            public string TRABAJADOR { get; set; }
        }

        private static void Main(string[] args)
        {
            new DB("test");

            DB.Index<Project>()
              .Key(p => p.WORKER, KeyType.Text)
              .Key(p => p.TRABAJADOR, KeyType.Text)
              .Option(o => o.DefaultLanguage = "none")
              .Option(o => o.Background = false)
              .Create();

            (new[] {
                new Project { WORKER = "JESUS HERNANDEZ DIAZ"},
                new Project { TRABAJADOR = "JESÚS HERNÁNDEZ DÍAZ"}
            }).Save();

            var result = DB.SearchText<Project>("jesus");

            Console.WriteLine($"found: {result.Count()}");
            Console.Read();
        }
    }
}
Dĵ ΝιΓΞΗΛψΚ
  • 5,068
  • 3
  • 13
  • 26
  • 1
    I'm not sure why this was downvoted, but a text index with a language is the way to go here for dealing with accented characters. – Pete Garafano Jul 12 '19 at 13:49
  • Good answer here https://stackoverflow.com/questions/39145020/manually-supplying-arguments-to-a-mongodb-query-to-support-collation-feature-fo/44898444#44898444 – Lorenzo Mar 06 '20 at 20:57