MongoDB recommends using indexes with collation with locale + strength 1 or 2 to search with case insensitive.
This seems to work fine when using an equals
filter, but not when using a Contains(text)
filter.
So, for example, if I have a collection of TDocument
I can create an index like this:
//reusing collation to make sure I always use the same
public static class CollationOptions
{
public static readonly Collation CaseInsensitiveCollation = new Collation("en", strength: CollationStrength.Primary);
}
var indexOptions =
new CreateIndexOptions
{
Collation = CollationOptions.CaseInsensitiveCollation
};
Expression<Func<TDocument, object>> myIndex = x => x.Name;
var combined =
new List<IndexKeysDefinition<TDocument>>
{
// I could combine or have just one
Builders<TDocument>.IndexKeys.Ascending(myIndex);
}
var indexDefinition = Builders<TDocument>.IndexKeys.Combine(combined);
var createIndexModel = new CreateIndexModel<TDocument>(indexDefinition, indexOptions);
var collection = _magicRetrieverService.GetCollection<TDocument>(); // it's a standard IMongoCollection<TDocument>
await collection.Indexes.CreateOneAsync(createIndexModel);
Then, when searching like this:
Expression<Func<Document, bool>> filter = x => x.Name == "foo";
var findOptions =
new FindOptions<TDocument,TDocument>
{
AllowPartialResults = false,
//Sort = sortDefinition, // irrelevant
Limit = 100,
Collation = CollationOptions.CaseInsensitiveCollation
};
var queryResult = await collection.FindAsync(filter, findOptions, cancellationToken);
var results = await queryResult.ToListAsync(cancellationToken: cancellationToken);
Then the query finds all documents that has the name foo
, Foo
, FOO
, fOO
.
However, If I modify the filter to use Contains
so that it can also find a document with name this is FOO also
, the insensitive case search does not work as expected. It only matches the documents with name containing exactly the text: foo
.
Expression<Func<Document, bool>> filter = x => x.Name.Contains("foo");
Why isn't collation in index and find method working as expected and returning all the documents that contain foo
independently of the case?
I'm using MongoDB.Driver 2.12.4
and mongo:4.0
running as a container
docker run --name mongodb -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=dummy -p 27017:27017 -d mongo:4.0
Of course, the following filter works fine, but I thought I could deal with case insensitive searches without having to explicitly convert anything.
Expression<Func<Document, bool>> filter x => x.Name.ToLower().Contains("foo".ToLower());
Related questions but not the same: