32

What is the easiest way to filter elements with LINQ through the Where method ignoring accentuation and case?

So far, I've been able to ignore Casing by calling methods on the properties, which I dont think is a good idea because it calls the same method for every element (right?).

So here's what I got so far:

var result = from p in People
             where p.Name.ToUpper().Contains(filter.ToUpper())
             select p;

Please tell me if this is a good practice, and the easiest way to ignore accentuation.

Smur
  • 3,075
  • 7
  • 28
  • 46

7 Answers7

61

To ignore case and accents (diacritics) you can first define an extension method like this:

    public static string RemoveDiacritics(this String s)
    {
        String normalizedString = s.Normalize(NormalizationForm.FormD);
        StringBuilder stringBuilder = new StringBuilder();

        for (int i = 0; i < normalizedString.Length; i++)
        {
            Char c = normalizedString[i];
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                stringBuilder.Append(c);
        }

        return stringBuilder.ToString();
    }

(Modified from Ignoring accented letters in string comparison)

Now you can run your query:

string queryText = filter.ToUpper().RemoveDiacritics();

var result = from p in People
         where p.Name.ToUpper().RemoveDiacritics() == queryText
         select p;

This is fine if you are just iterating over a collection in C#, but if you are using LINQ to SQL it is preferable to avoid non-standard methods (including extension methods) in your LINQ query. This is because your code cannot be converted into valid SQL and hence run on SQL Server with all its lovely performance optimization.

Since there doesn't seem to be a standard way of ignoring accents within LINQ to SQL, in this case I would suggest changing the field type that you want to search to be case- and accent-insensitive (CI_AI).

With your example:

ALTER TABLE People ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI

Your query should now ignore accentuation and case.

Note that you will need to temporarily remove any unique constraints on the field before running the above query, e.g.

ALTER TABLE People DROP CONSTRAINT UQ_People_Name

Now your LINQ query would simply be:

var result = from p in People
         where p.Name == filter
         select p;

See related question here.

Community
  • 1
  • 1
Dunc
  • 18,404
  • 6
  • 86
  • 103
  • 1
    Awesome. Mind if I blog about this? – Smur Nov 16 '11 at 13:33
  • 1
    Great answer. You actually answered the question before telling that the real solution is probably another approach. Too many only do the latter. – Niels Brinch May 30 '12 at 11:58
  • 5
    Beware, changing collation of a column should be done carefully. It could cause collation mismatch and could change the semantics of other queries. – Frédéric Aug 30 '13 at 07:49
2

For accents, you can enumerate all of them (here for French language) if you can't update your db schema nor fetch the entire list in RAM:

var result = from p in People
             where p.Name.ToLower()
                .Replace("à", "a")
                .Replace("â", "a")
                .Replace("ä", "a")
                .Replace("ç", "c")
                .Replace("é", "e")
                .Replace("è", "e")
                .Replace("ê", "e")
                .Replace("ë", "e")
                .Replace("î", "i")
                .Replace("ï", "i")
                .Replace("ô", "o")
                .Replace("ù", "u")
                .Replace("û", "u")
                .Replace("ü", "u").Contains(RemoveDiacritics(filter.ToLower()))
             select p;
                
Pepe Alvarez
  • 1,218
  • 1
  • 9
  • 15
1

Change collate:

ALTER TABLE dbo.MyTable 
ALTER COLUMN CharCol varchar(10)**COLLATE Latin1_General_CI_AS** NOT NULL;
Harry
  • 87,580
  • 25
  • 202
  • 214
0

Following Dunc's solution of changing the collation of the whole database, here is a full tutorial that deals with indexes, keys etc.:

https://www.codeproject.com/Articles/302405/The-Easy-way-of-changing-Collation-of-all-Database

(Just make sure to read all the comments first.)

LePatay
  • 172
  • 1
  • 8
  • 1
    Since this is not a new answer but rather an update to the already accepted answer, do you think it needs to be posted as an answer? IMO, a comment would be more appropriate. – Eduard Malakhov Mar 14 '18 at 13:06
  • I'm below 50 reputation, I can't comment on other users' posts! Otherwise it would have been my pleasure :) – LePatay Mar 14 '18 at 13:11
0

If you use Linq-to-Entities, you could:

1. Create an SQL Function to remove the diacritics, by applying to the input string the collation SQL_Latin1_General_CP1253_CI_AI, for example:

CREATE FUNCTION [dbo].[RemoveDiacritics] (
@input varchar(max)
)   RETURNS varchar(max)

AS BEGIN
DECLARE @result VARCHAR(max);

select @result = @input collate SQL_Latin1_General_CP1253_CI_AI

return @result
END

2. Add it in the DB context (in this case ApplicationDbContext) by mapping it with the attribute DbFunction, for example:

 public class ApplicationDbContext : IdentityDbContext<CustomIdentityUser>
    {
        [DbFunction("RemoveDiacritics", "dbo")]
        public static string RemoveDiacritics(string input)
        {
            throw new NotImplementedException("This method can only be used with LINQ.");
        }

        public ApplicationDbContext(DbContextOptions<ApplicationDbContext> options)
            : base(options)
        {
        }
}

3. Use it in LINQ query, for example:

var query = await db.Users.Where(a => ApplicationDbContext.RemoveDiacritics(a.Name).Contains(ApplicationDbContext.RemoveDiacritics(filter))).tolListAsync();

being filter the string you want to search, in this case in the column name of the table Users of the DB.

Dharman
  • 30,962
  • 25
  • 85
  • 135
João Neto
  • 51
  • 3
0

As of Entity Framework Core 5.0 you can now alter the collation of the query on the fly for Linq to SQL.

So for your example, if I wanted to ignore both case and accents I would do something like:

(Note that we cannot use contains but we can use the SQL 'like' operator)

var result = from p in People
             where EF.Functions.Like(EF.Functions.Collate(p.Name, "Latin1_General_CI_AI"), $"%{filter}%")
             select p;

Latin1_General_CI_AI is Case insensitive (CI) and Accent insensitive (AI)

More information here on EF collations and case sensitivity for EF:

https://learn.microsoft.com/en-us/ef/core/miscellaneous/collations-and-case-sensitivity#explicit-collation-in-a-query

Eamon
  • 51
  • 1
  • 4
0

Here is some code that allows comparison ignoring accentuation:

Ignoring accented letters in string comparison

I will have the decency of not copying the code, so that the author can get rep for his answer. Now, answering your question:

You'd get that piece of code and use it like this:

var result = from p in People
             where p.Name.ToUpper().Contains(RemoveDiacritics(filter.ToUpper()))
             select p;

You even turn that code into an extension method. I have :)

Community
  • 1
  • 1
Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
  • 2
    How does this remove the accents from the value on the *SQL* side? As is, won't it only compare the upper-cased SQL value with the non-accented (upper-cased) C# value? – Kirk Woll Sep 14 '11 at 15:16
  • 1
    @Kirk - You're correct, but I'm not sure it's possible to do it on the SQL side. I was about to post this same solution. – Justin Morgan - On strike Sep 14 '11 at 15:18
  • 1
    @Adrian The method you showed me does remove the accentuation, but I cannot call it on the SQL side because LINQ does not find a translation to the DB, as pointed by Kirk and Justin. Would you guys say that there's a way of making it work without having to touch the database? – Smur Sep 14 '11 at 17:55
  • 1
    @Felipe, I'm pretty sure this is impossible with straight Linq-To-Sql. However, you could write a stored procedure (or TVF) that integrates the [collating suggestions described elsewhere](http://stackoverflow.com/questions/2461522/how-do-i-perform-an-accent-insensitive-compare-e-with-e-e-e-and-e-in-sql-serv). You can then [add a method call to this SP in your datacontext](http://msdn.microsoft.com/en-us/library/bb384574.aspx). – Kirk Woll Sep 14 '11 at 20:00
  • 1
    @Kirk I see, I'd then have to create a 'method' for that directly on my SQL Server Database. Seems pretty straightforward. – Smur Sep 16 '11 at 13:29
  • 1
    This obviously won't work with **LINQ-to-Sql** or **LINQ-to-Entities**. – QuantumHive Nov 15 '16 at 13:45