62

I'm wondering what the correct way to compare two characters ignoring case that will work for all cultures. Also, is Comparer<char>.Default the best way to test two characters without ignoring case? Does this work for surrogate-pairs?

EDIT: Added sample IComparer<char> implementation

If this helps anyone this is what I've decided to use

public class CaseInsensitiveCharComparer : IComparer<char> {
    private readonly System.Globalization.CultureInfo ci;
    public CaseInsensitiveCharComparer(System.Globalization.CultureInfo ci) {
        this.ci = ci;
    }
    public CaseInsensitiveCharComparer()
        : this(System.Globalization.CultureInfo.CurrentCulture) { }
    public int Compare(char x, char y) {
        return Char.ToUpper(x, ci) - Char.ToUpper(y, ci);
    }
}

// Prints 3
Console.WriteLine("This is a test".CountChars('t', new CaseInsensitiveCharComparer()));
Brett Ryan
  • 26,937
  • 30
  • 128
  • 163
  • ToUpper may convert the char to the correct upper case with respect to the current culture, but the lexical order returned is not correct. Possibly this is only supported in .NET for the string comparisons. – Holstebroe Nov 20 '13 at 14:44

9 Answers9

102

It depends on what you mean by "work for all cultures". Would you want "i" and "I" to be equal even in Turkey?

You could use:

bool equal = char.ToUpperInvariant(x) == char.ToUpperInvariant(y);

... but I'm not sure whether that "works" according to all cultures by your understanding of "works".

Of course you could convert both characters to strings and then perform whatever comparison you want on the strings. Somewhat less efficient, but it does give you all the range of comparisons available in the framework:

bool equal = x.ToString().Equals(y.ToString(), 
                                 StringComparison.InvariantCultureIgnoreCase);

For surrogate pairs, a Comparer<char> isn't going to be feasible anyway, because you don't have a single char. You could create a Comparer<int> though.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • That's the way I had thought of doing it in both of your examples but thought there might have been a better way that I had not have known existed that the framework provides. I was thinking in the context of the LINQ extension method for String.Contains(char, IEqualityComparer) – Brett Ryan Sep 08 '09 at 16:27
  • 3
    There's no framework method for this: string comparison is actually implemented using native methods, not by dropping down to a Comparer implementation. – Julian Birch Sep 09 '09 at 08:48
  • @TimSchmelter: No, managed to miss that for some reason. Added a quick note at the end. – Jon Skeet May 07 '15 at 07:44
  • This solution is not very efficient. If you have two strings of one billion characters of which the first character already differs, then `x.ToString()` will enumerate all characters, which is quite a waste, because one could have stopped comparing after the first character – Harald Coppoolse Jul 21 '22 at 19:25
  • @HaraldCoppoolse: Except that `x` is a single character here, in the context of this question. – Jon Skeet Jul 21 '22 at 21:22
15

Using the default (that is not the invariant) culture:

if (char.ToLower(ch1) == char.ToLower(ch2))
{  ....  }

Or specify a culture:

CultureInfo myCulture = ...;
if (char.ToLower(ch1, myCulture) == char.ToLower(ch2, myCulture))
{  ....  }
H H
  • 263,252
  • 30
  • 330
  • 514
3

As I understand it, there isn't really a way that will "work for all cultures". Either you want to compare characters for some kind of internal, non-displayed-to-the-user reason (in which case you should use the InvariantCulture), or you want to use the CurrentCulture of the user. Obviously, using the user's current culture will mean that you will get different results in different locales, but they will be consistent with what your users in those locales will expect.

Without knowing more about WHY you are comparing two characters, I can't really advise you on which one you should be using.

Jon Grant
  • 11,369
  • 2
  • 37
  • 58
  • Thanks Jon, it's a general question, I'm not versed well with unicode and thought I'd pose the question here. Consider the String.Contains(char, IEqualityComparer) extension method that LINQ provides, what would be the correct way to implement that being case-insensitive? – Brett Ryan Sep 08 '09 at 16:25
  • Again, it would really depend on what the data was and why you were comparing it. It you just wanted to sort things into some consistent order for example, using any of the various Invariant comparisons would be fine. If you're responding to user input, you probably want to use the culture of that user to give them results they would expect. I'm not sure there is really a "one size fits all" answer. – Jon Grant Sep 08 '09 at 16:30
  • Do you think my Comparer implementation provided as an answer would be a correct approach? – Brett Ryan Sep 08 '09 at 16:51
1

I would recommend comparing uppercase, and if they don't match then comparing lowercase, just in case the locale's uppercasing and lowercasing logic behave slightly different.

Addendum

For example,

int CompareChar(char c1, char c2)
{
    int  dif;

    dif = char.ToUpper(c1) - char.ToUpper(c2);
    if (diff != 0)
        dif = char.ToLower(c1) - char.ToLower(c2);
    return dif;
}
David R Tribble
  • 11,918
  • 5
  • 42
  • 52
0

What I was thinking that would be available within the runtime is something like the following

public class CaseInsensitiveCharComparer : IComparer<char> {
    private readonly System.Globalization.CultureInfo ci;
    public CaseInsensitiveCharComparer(System.Globalization.CultureInfo ci) {
        this.ci = ci;
    }
    public CaseInsensitiveCharComparer()
        : this(System.Globalization.CultureInfo.CurrentCulture) { }
    public int Compare(char x, char y) {
        return Char.ToUpper(x, ci) - Char.ToUpper(y, ci);
    }
}

// Prints 3
Console.WriteLine("This is a test".CountChars('t', new CaseInsensitiveCharComparer()));
Brett Ryan
  • 26,937
  • 30
  • 128
  • 163
  • 1
    It's dangerous to assume that the char comparison by subtraction will continue to be correct in future CLR versions, so I would use `return Char.ToUpper(x, ci).CompareTo(Char.ToUpper(y, ci));` instead. – Matt Howells Nov 13 '09 at 16:14
  • @MattHowells I would argue that ... please see `char.CompareTo(char)`: `return (m_value-value);` –  May 27 '14 at 08:27
-1

You could try:

    class Test{
    static int Compare(char t, char p){
        return string.Compare(t.ToString(), p.ToString(), StringComparison.CurrentCultureIgnoreCase);
    }
}

But I doubt this is the "optimal" way to do it, but I'm not all of the cases you need to be checking...

ahawker
  • 3,306
  • 24
  • 23
-2

string.Compare("string a","STRING A",true)

It will work for every string

Sergio
  • 8,125
  • 10
  • 46
  • 77
  • 1
    Hi Sergio, I'm after a way to compare two char instances, not string instances. I'm looking for a Comparer implementation that ignores case. – Brett Ryan Sep 08 '09 at 16:19
  • 9
    This works great in English speaking countries. However, nobody in eastern Europe will ever use an application you write. – Jon Grant Sep 08 '09 at 16:22
  • 2
    @Jon Grant: I use this at my country (Portugal), Portuguese is a Latin based language that has lots of "weird" characters like: ã é à ç, it works perfectly for me. – Sergio Sep 08 '09 at 16:33
-3

I know this is an old post, but things have changed since then.

The question above can be answered by using an extension. This would extend the char.Equals to allow for locality and case insensitivity.

In an extension class, add something such as:

internal static Boolean Equals(this Char src, Char ch, StringComparison comp)
{
    Return $"{src}".Equals($"{ch}", comp);
}

I'm currently at work, so can't check this, but it should work.

Andy

Andy
  • 27
  • 4
-4

You can provide last argument as true for caseInsensetive match

string.Compare(lowerCase, upperCase, true);
coder
  • 8,346
  • 16
  • 39
  • 53