7

I need to get all the results where the text contains a particular word ignoring all accents.

Now I have the following:

filtered = result.Where(p => p.@string.ToString().ToUpper().Contains(word));

Or a simplified version:

filtered = result.ToUpper().Contains(word));

How can I make the "Contains" statement ignore the accents?

Thanks in advance

Ingrid
  • 741
  • 2
  • 8
  • 15
  • 1
    How is accent represented in the `word` value? – Andrei Sep 28 '15 at 16:44
  • 1
    http://stackoverflow.com/questions/444798/case-insensitive-containsstring – M.kazem Akhgary Sep 28 '15 at 16:49
  • Please state the flavour of Linq you are using in your question. – Aron Sep 28 '15 at 16:58
  • He is using LinqToObjects because `.ToString().ToUpper()` would fail on EF and Linq2Sql (80% sure). – Scott Chamberlain Sep 28 '15 at 16:59
  • Here you can find detailed answer to your question: [http://stackoverflow.com/questions/444798/case-insensitive-containsstring][1] [1]: http://stackoverflow.com/questions/444798/case-insensitive-containsstring – Marko Krizmanic Sep 28 '15 at 17:00
  • If you mean all the different Latin letters with various accent marks, you could make a dictionary linking each ASCII letter with each possible accented permutation. For example, "āăąáâãäå" could all be translated to "a" and then used in a Linq query. (This is just a few of the possibilities for that letter). This would be quite a job, though, and I'm not sure this is what you are even looking for. – Ric Gaudet Sep 28 '15 at 17:03
  • 2
    http://stackoverflow.com/questions/359827/ignoring-accented-letters-in-string-comparison – Ric Gaudet Sep 28 '15 at 17:22

2 Answers2

28

Borrowing a similar solution form here:

string[] result = {"hello there", "héllo there","goodbye"};

string word = "héllo";

var compareInfo = CultureInfo.InvariantCulture.CompareInfo;

var filtered = result.Where(
      p => compareInfo.IndexOf(p, word, CompareOptions.IgnoreNonSpace) > -1);
Community
  • 1
  • 1
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • It works perfectly! Thanks. – Ingrid Sep 29 '15 at 07:53
  • 1
    The simplest way is to change the collate of the column you are using to the one that correspond to the data. That way, you don't need to do anything in you code. This is the sql command to do it: ALTER TABLE table_name_here ALTER COLUMN clumn_name_here [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI – Sterling Diaz Nov 13 '17 at 22:55
  • 1
    @SterlingDiaz True that would be simple, but can have dramatic side effects outside the scope of the question. Searches, sorting, and other comparisons can be negatively affected. – D Stanley Nov 13 '17 at 23:26
  • If the are accents, for example, the collate is latin because english does not have it. It's a database design/datatype/collation problem more than a linq issue. – Sterling Diaz Nov 14 '17 at 03:19
  • Unfortunately, this solution isn't suitable at least for Spanish where `ñ` isn't an accented `n` but a separate letter of the alphabet. See [this answer](https://stackoverflow.com/a/47488633/1014048) how to normalize Spanish strings. – Ivan Mir Nov 03 '18 at 22:14
-1

You want to use the StringComparison.InvariantCultureIgnoreCase enum.

Source https://msdn.microsoft.com/en-us/library/system.stringcomparison(v=vs.110).aspx

filtered = result.Contains(word, StringComparison.InvariantCultureIgnoreCase);

However this is only going to work with LinqToObject. If you are using LinqToSQL or LinqToEntityFramework or LinqToNHibernate, this will not work.

Aron
  • 15,464
  • 3
  • 31
  • 64
  • Letter case generally not called "accents" - there is no `StringComparison` option that let one to ignore accents (assuming OP actually interested in that) – Alexei Levenkov Sep 29 '15 at 06:01
  • @AlexeiLevenkov Please look at the link provided. My point was to use the `InvariantCulture` part of the enum. You should also take a look at [the various different compares that are demonstrated here](https://msdn.microsoft.com/en-us/library/t4411bks(v=vs.110).aspx). – Aron Sep 29 '15 at 06:17
  • 3
    Aron, I'm not sure what you expect `InvariantCulture` to do for comparison - none of the values of that enum deals with accents (all about sorting rules and case). Also `Contains` does not take `StringComparison` (despite everyone expecting so) - you probably mean `IndexOf`... Code samples: `"e".IndexOf("é", StringComparison.InvariantCultureIgnoreCase)` and `string.Compare("e","é", StringComparison.InvariantCultureIgnoreCase)` show that accented character considered different for these operations. – Alexei Levenkov Sep 29 '15 at 14:52