1

I have 2 different strings (XXÈ and XXE). Is there any way to compare them using a collation (for this case, it would be UTF8 general CI - I need them to be equal)? I've seen few examples involving MSSQL or SQLLite - but this would add an unnecessary dependency to my project. So, my question is - is there any way to do this in pure .net (especially c#)?

Update:

Let's take any decent SQL engine as an example. You can create a table and you can select the collation for the table. In our case, XXÈ and XXE will be stored in the table, they will have different binary representations (depending on the encoding), but when you search for XXE, it will match also XXÈ.

My case is pretty much similar. I have a text file with some strings in it (UTF8). I want to display the values on screen (sorted - where the collation is again, relatively important) and I want to let the user search for values. The collation used for search will be an option.

dcg
  • 1,144
  • 1
  • 22
  • 38
  • have a look at [Encoding class](http://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx) – Selman Genç Oct 21 '14 at 09:57
  • @Selman22 - thank you. afaik, encoding refers only to the way string is represented internally. I still need "E" to be visible as "E" and "È" to be "È", but when I write the code, i need "È" to be equal with "E". Something like SQL does (XXE will find XXÈ depending on the collation of the table); – dcg Oct 21 '14 at 11:25

1 Answers1

0

You could use String.Normalize and a little bit LINQ-power:

string initial = "XXÈ";
string normal = initial.Normalize(NormalizationForm.FormD);

var withoutDiacritics = normal.Where(
    c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
string final = new string(withoutDiacritics.ToArray());
bool equals = "XXE".Equals(final); // true

Reference: http://www.blackwasp.co.uk/RemoveDiacritics.aspx

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • I was thinking to something more automated. Without me having to manually cleanup the text from diacritics and other stuff. It is nearly impossible to do this to cover all languages/situations. – dcg Oct 21 '14 at 11:23
  • @dcg: i don't understand why it's impossible. You wanted an approach in C#, so put this in a method(or even a string-extension) with a meaningful name and use that instead of `==` or `Equals` where you need it. – Tim Schmelter Oct 21 '14 at 11:30
  • 2
    Not entirely impossible. But how would I handle any chinese/japanese collation where some characters are visually somehow different even though they are pretty close as a meaning? – dcg Oct 21 '14 at 11:33