18

I have a database table that contains names with accented characters. Like ä and so on.

I need to get all records using EF4 from a table that contains some substring regardless of accents.

So the following code:

myEntities.Items.Where(i => i.Name.Contains("a")); 

should return all items with a name containing a, but also all items containing ä, â and so on. Is this possible?

Øyvind Bråthen
  • 59,338
  • 27
  • 124
  • 151
  • 2
    What database are you using? You might have to set a accent-insensitive collation order on that column.. – stuartd Jun 15 '11 at 12:05
  • Can you use this method in EF? http://stackoverflow.com/questions/2461522/how-do-i-perform-an-accent-insensitive-compare-e-with-e-e-e-and-e-in-sql-serv – stuartd Jun 15 '11 at 12:10
  • @Stuart Dunkeld - That did the trick. I changed the collation for the Name column in the database, and then it worked like a charm. If you post that as an answer, I'll upvote and accept it. – Øyvind Bråthen Jun 15 '11 at 12:23

5 Answers5

15

If you set an accent-insensitive collation order on the Name column then the queries should work as required.

stuartd
  • 70,509
  • 14
  • 132
  • 163
  • How would you change the collation of a specific column by code? – Alejandro del Río Jan 17 '13 at 22:53
  • 1
    @AlejandrodelRío see http://blog.sqlauthority.com/2008/12/20/sql-server-change-collation-of-database-column-t-sql-script/ for an example – stuartd Jan 18 '13 at 10:27
  • This can't be done when you already have created the table and it has data inside of it. At least in sql server 2008 R2. When you create a new table or a new database there's no problem at all. My question was about entity framework anyway. Thanks. – Alejandro del Río Jan 22 '13 at 21:20
  • 2
    You [can change](https://msdn.microsoft.com/en-us/library/ms190920.aspx) the column collation as long it is not referenced by a FK, Index, Checks, Computed Column or Distribution statistics. One example of SQL would be `alter table [dbo].[Table] ALTER COLUMN [ColumnName] nvarchar(MAX) COLLATE SQL_Latin1_General_CP1_CI_AI;` – Reuel Ribeiro Aug 14 '16 at 01:55
  • Wouldn't a collation remove the accents upon insertion already? – Thomas Weller Jan 15 '20 at 22:38
  • @ThomasWeller the collation just affects how the data is queried, it does not modify inserted data - a case insensitive collation does not modify the case of inserted data, after all – stuartd Jan 16 '20 at 11:23
11

Setting an accent-insensitive collation will fix the problem.

You can change the collation for a column in SQL Server and Azure database with the next query.

ALTER TABLE TableName
ALTER COLUMN ColumnName NVARCHAR (100)
COLLATE SQL_LATIN1_GENERAL_CP1_CI_AI NOT NULL

SQL_LATIN1_GENERAL_CP1_CI_AI is the collation where LATIN1_GENERAL is English (United States), CP1 is code page 1252, CI is case-insensitive, and AI is accent-insensitive.

Roberto Orozco
  • 579
  • 7
  • 11
0

I know that is not so clean solution, but after reading this I tried something like this:

var query = this.DataContext.Users.SqlQuery(string.Format("SELECT *  FROM dbo.Users WHERE LastName like '%{0}%' COLLATE Latin1_general_CI_AI", parameters.SearchTerm));

After that you are still able to call methods on 'query' object like Count, OrderBy, Skip etc.

Crono
  • 10,211
  • 6
  • 43
  • 75
jgasiorowski
  • 1,033
  • 10
  • 20
0

You could create an SQL Function to remove the diacritics, by applying to the input string the collation SQL_Latin1_General_CP1253_CI_AI, like so:

CREATE FUNCTION [dbo].[RemoveDiacritics] (
@input varchar(max)
)   RETURNS varchar(max)

AS BEGIN
DECLARE @result VARCHAR(max);

select @result = @input collate SQL_Latin1_General_CP1253_CI_AI

return @result
END

Then add it in the DB context (in this case ApplicationDbContext) by mapping it with the attribute DbFunction:

public class ApplicationDbContext : IdentityDbContext<CustomIdentityUser>
    {
        [DbFunction("RemoveDiacritics", "dbo")]
        public static string RemoveDiacritics(string input)
        {
            throw new NotImplementedException("This method can only be used with LINQ.");
        }

        public ApplicationDbContext(DbContextOptions<ApplicationDbContext> options)
            : base(options)
        {
        }
}

And Use it in LINQ query, for example:

var query = await db.Users.Where(a => ApplicationDbContext.RemoveDiacritics(a.Name).Contains(ApplicationDbContext.RemoveDiacritics(filter))).tolListAsync();
João Neto
  • 51
  • 3
-3

Accent-insensitive Collation as Stuart Dunkeld suggested is definitely the best solution ...

But maybe good to know:

Michael Kaplan once posted about stripping diacritics:

static string RemoveDiacritics(string stIn)
{
    string stFormD = stIn.Normalize(NormalizationForm.FormD);
    StringBuilder sb = new StringBuilder();

    for(int ich = 0; ich < stFormD.Length; ich++)
    {
        UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
        if(uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(stFormD[ich]);
        }
    }

    return(sb.ToString().Normalize(NormalizationForm.FormC));
}

Source

So your code would be:

myEntities.Items.Where(i => RemoveDiacritics(i.Name).Contains("a")); 
Community
  • 1
  • 1
ordag
  • 2,497
  • 5
  • 26
  • 35
  • 3
    Your example will not work because `RemoveDiacritics` will have to run on the SQL server. – Ladislav Mrnka Jun 15 '11 at 12:51
  • As @Ladislav said, this will strip accent from the *input* string (that will probably already be without accent) and check towards the database that still regards accents. This will actually make it worse, since an input of â will be converted to a, and not even find the â in the database even if the user actually bother to write the accents when performing the search. But.. good to know that there is a way to strip accents of a string in C# should I ever need to do that ;) – Øyvind Bråthen Jun 15 '11 at 12:56
  • @Øyvind Knobloch-Bråthen - I thought your input string would be `"a"`, and `i.Name` the one in the db, but yeah this sould only be one way to strip those if theres more trouble with accents, etc =D – ordag Jun 15 '11 at 13:20