4

How to sort a list that contains letters with diacritic markings?

Words used in this example are made up.

Now I get a list that displays this:

  • báb
  • baz
  • bez

But I want to get a list that displays this:

  • baz
  • báb
  • bez

Showing the diacritic as a letter on its own. Is there a way to do this in C#?

Teysz
  • 741
  • 9
  • 33

1 Answers1

2

If you set the culture of the current thread to the language you want to sort for then this should work automagically (assuming you don't want some special customized sort order). Like this

List<string> mylist;
....
Thread.CurrentThread.CurrentCulture = new CultureInfo("pl-PL");
mylist.Sort();

Should get you the list sorted according to the Polish culture settings.

Update: If the culture settings don't sort it the way you want then another option is to implement your own string comparer.

Update 2: String comparer example:

public class DiacriticStringComparer : IComparer<string>
{
    private static readonly HashSet<char> _Specials = new HashSet<char> { 'é', 'ń', 'ó', 'ú' };

    public int Compare(string x, string y)
    {
        // handle special cases first: x == null and/or y == null,  x.Equals(y)
        ...

        var lengthToCompare = Math.Min(x.Length, y.Length);
        for (int i = 0; i < lengthToCompare; ++i)
        {
            var cx = x[i];
            var cy = y[i];

            if (cx == cy) continue;

            if (_Specials.Contains(cx) || _Specials.Contains(cy))
            {
                // handle special diacritics comparison
                ...
            }
            else
            {
                // cx must be unequal to cy -> can only be larger or smaller
                return cx < cy ? -1 : 1;
            }
        }
        // once we are here the strings are equal up to lengthToCompare characters
        // we have already dealt with the strings being equal so now one must be shorter than the other
        return x.Length < y.Length ? -1 : 1;
    }
}

Disclaimer: I haven't tested it but it should give you the general idea. Also char.CompareTo() does not compare lexicographically but according to one source I found < and > does - can't guarantee it though. Worst case you have to convert cx and cy into strings and then use the default string comparison.

ChrisWue
  • 18,612
  • 4
  • 58
  • 83
  • My simple test with polish returns `báb, baz, bez` – Albin Sunnanbo Mar 27 '11 at 08:17
  • I want a culture that has the following letters: é, ń, ó & ú. And puts them after e, n, o & u respectively. I found Icelandic who has 3 out of 4 letters, but I want one with all four letters. Does someone knows such a culture? – Teysz Mar 28 '11 at 01:18
  • Could you give me an example of such a string comparer. – Teysz Mar 28 '11 at 11:02
  • @Mat I can't help but comment that "finding a culture whose sorting rules works" seems like using a trowel to flip pancakes. Possible more information here http://stackoverflow.com/questions/359827/ignoring-accented-letters-in-string-comparison and you might want to search some more on this. I found that as #1 hit on a Google search. – Shibumi Mar 28 '11 at 21:54
  • @Shibumi I don't want to remove the diacritics like it says in the link you wrote. I know that there is not such a culture. That's why I asked an other method to do this, such as the method ChrisWue wrote as an answer. @ChrisWue Thank you for your help, I'm going to try it out. – Teysz Mar 29 '11 at 00:33
  • @Mat Yeah, I realize that, but you can use a method that removes them to 'prep' the strings for comparison so that you can sort them. See what I mean? – Shibumi Mar 29 '11 at 01:30
  • *4 years later* apparently I forgot to reply, so I will now. @Shibumi Go read the question again, it explicitly said "without removing diacritic". The letters with diacritics need to be ordered after the "normal" counterparts, so they are different and it needs to be sorted that way, not without diacritics. So I don't see what you meant, but anyways the problem was solved, and no diacritics were harmed (read removed) in the process. – Teysz Jun 04 '15 at 13:09
  • @Teysz Yikes. Going through your activity history? I didn't even remember this question *:)*. – Shibumi Jul 10 '15 at 04:02