155

In C#, what is the difference between ToUpper() and ToUpperInvariant()?

Can you give an example where the results might be different?

riQQ
  • 9,878
  • 7
  • 49
  • 66
Lill Lansey
  • 4,775
  • 13
  • 55
  • 77

6 Answers6

176

ToUpper uses the current culture. ToUpperInvariant uses the invariant culture.

The canonical example is Turkey, where the upper case of "i" isn't "I".

Sample code showing the difference:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpperInvariant();
        CultureInfo turkey = new CultureInfo("tr-TR");
        Thread.CurrentThread.CurrentCulture = turkey;
        string cultured = "iii".ToUpper();

        Font bigFont = new Font("Arial", 40);
        Form f = new Form {
            Controls = {
                new Label { Text = invariant, Location = new Point(20, 20),
                            Font = bigFont, AutoSize = true},
                new Label { Text = cultured, Location = new Point(20, 100),
                            Font = bigFont, AutoSize = true }
            }
        };        
        Application.Run(f);
    }
}

For more on Turkish, see this Turkey Test blog post.

I wouldn't be surprised to hear that there are various other capitalization issues around elided characters etc. This is just one example I know off the top of my head... partly because it bit me years ago in Java, where I was upper-casing a string and comparing it with "MAIL". That didn't work so well in Turkey...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 53
    haha I read that thinking... "'Turkey' doesn't have a letter 'i' in it" – Jeff Mercado Aug 23 '10 at 17:57
  • 3
    It's almost 2019 and I'm having Visual Studio suggesting `ımage` as a field name for `Image` and Unity 3D spamming an internal error to the console `Unable to find key name that matches 'rıght'` on an "English" Windows with Turkey regional settings for date and time. Looks like sometimes even Microsoft fails the Turkey test, an PC's language isn't even Turkish, just lol. – Guney Ozsan Oct 28 '18 at 16:28
29

Jon's answer is perfect. I just wanted to add that ToUpperInvariant is the same as calling ToUpper(CultureInfo.InvariantCulture).

That makes Jon's example a little simpler:

using System;
using System.Drawing;
using System.Globalization;
using System.Threading;
using System.Windows.Forms;

public class Test
{
    [STAThread]
    static void Main()
    {
        string invariant = "iii".ToUpper(CultureInfo.InvariantCulture);
        string cultured = "iii".ToUpper(new CultureInfo("tr-TR"));

        Application.Run(new Form {
            Font = new Font("Times New Roman", 40),
            Controls = { 
                new Label { Text = invariant, Location = new Point(20, 20), AutoSize = true }, 
                new Label { Text = cultured, Location = new Point(20, 100), AutoSize = true }, 
            }
        });
    }
}

I also used New Times Roman because it's a cooler font.

I also set the Form's Font property instead of the two Label controls because the Font property is inherited.

And I reduced a few other lines just because I like compact (example, not production) code.

I really had nothing better to do at the moment.

Tergiver
  • 14,171
  • 3
  • 41
  • 68
  • 1
    ToUpper method doesnt have any parameter overload for me? did older version have? I dont get it – Emil Apr 28 '17 at 00:38
  • I don't know, it's documented here: https://msdn.microsoft.com/en-us/library/system.string.toupper.aspx – Tergiver Apr 29 '17 at 10:10
24

String.ToUpper and String.ToLower can give different results given different cultures. The most known example is the Turkish example, for which converting lowercase latin "i" to uppercase, doesn't result in a capitalized latin "I", but in the Turkish "I".

Capitalization of I depending on culture, upper row - lower case letters, lower row - upper case letters

As for me it was confusing even with the above picture (source), I wrote a program (see source code below) to see the exact output for the Turkish example:

# Lowercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - i (\u0069) | I (\u0049)     | I (\u0130)   | i (\u0069)     | i (\u0069)
Turkish i - ı (\u0131) | ı (\u0131)     | I (\u0049)   | ı (\u0131)     | ı (\u0131)

# Uppercase letters
Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish
English i - I (\u0049) | I (\u0049)     | I (\u0049)   | i (\u0069)     | ı (\u0131)
Turkish i - I (\u0130) | I (\u0130)     | I (\u0130)   | I (\u0130)     | i (\u0069)

As you can see:

  1. Uppercasing lower case letters and lowercasing upper case letters give different results for invariant culture and Turkish culture.
  2. Uppercasing upper case letters and lowercasing lower case letters has no effect, no matter what the culture is.
  3. Culture.CultureInvariant leaves the Turkish characters as is
  4. ToUpper and ToLower are reversible, that is lowercasing a character after uppercasing it, brings it to the original form, as long as for both operations the same culture was used.

According to MSDN, for Char.ToUpper and Char.ToLower Turkish and Azeri are the only affected cultures because they are the only ones with single-character casing differences. For strings, there might be more cultures affected.


Source code of a console application used to generate the output:

using System;
using System.Globalization;
using System.Linq;
using System.Text;

namespace TurkishI
{
    class Program
    {
        static void Main(string[] args)
        {
            var englishI = new UnicodeCharacter('\u0069', "English i");
            var turkishI = new UnicodeCharacter('\u0131', "Turkish i");

            Console.WriteLine("# Lowercase letters");
            Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
            WriteUpperToConsole(englishI);
            WriteLowerToConsole(turkishI);

            Console.WriteLine("\n# Uppercase letters");
            var uppercaseEnglishI = new UnicodeCharacter('\u0049', "English i");
            var uppercaseTurkishI = new UnicodeCharacter('\u0130', "Turkish i");
            Console.WriteLine("Character              | UpperInvariant | UpperTurkish | LowerInvariant | LowerTurkish");
            WriteLowerToConsole(uppercaseEnglishI);
            WriteLowerToConsole(uppercaseTurkishI);

            Console.ReadKey();
        }

        static void WriteUpperToConsole(UnicodeCharacter character)
        {
            Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
                character.Description,
                character,
                character.UpperInvariant,
                character.UpperTurkish,
                character.LowerInvariant,
                character.LowerTurkish
            );
        }

        static void WriteLowerToConsole(UnicodeCharacter character)
        {
            Console.WriteLine("{0,-9} - {1,10} | {2,-14} | {3,-12} | {4,-14} | {5,-12}",
                character.Description,
                character,
                character.UpperInvariant,
                character.UpperTurkish,
                character.LowerInvariant,
                character.LowerTurkish
            );
        }
    }


    class UnicodeCharacter
    {
        public static readonly CultureInfo TurkishCulture = new CultureInfo("tr-TR");

        public char Character { get; }

        public string Description { get; }

        public UnicodeCharacter(char character) : this(character, string.Empty) {  }

        public UnicodeCharacter(char character, string description)
        {
            if (description == null) {
                throw new ArgumentNullException(nameof(description));
            }

            Character = character;
            Description = description;
        }

        public string EscapeSequence => ToUnicodeEscapeSequence(Character);

        public UnicodeCharacter LowerInvariant => new UnicodeCharacter(Char.ToLowerInvariant(Character));

        public UnicodeCharacter UpperInvariant => new UnicodeCharacter(Char.ToUpperInvariant(Character));

        public UnicodeCharacter LowerTurkish => new UnicodeCharacter(Char.ToLower(Character, TurkishCulture));

        public UnicodeCharacter UpperTurkish => new UnicodeCharacter(Char.ToUpper(Character, TurkishCulture));


        private static string ToUnicodeEscapeSequence(char character)
        {
            var bytes = Encoding.Unicode.GetBytes(new[] {character});
            var prefix = bytes.Length == 4 ? @"\U" : @"\u";
            var hex = BitConverter.ToString(bytes.Reverse().ToArray()).Replace("-", string.Empty);
            return $"{prefix}{hex}";
        }

        public override string ToString()
        {
            return $"{Character} ({EscapeSequence})";
        }
    }
}
d219
  • 2,707
  • 5
  • 31
  • 36
krzychu
  • 3,577
  • 2
  • 27
  • 29
  • The table of cases was very helpful. Thanks! – VoteCoffee Jun 29 '18 at 15:15
  • I would clearly say that this is total misdesign from Microsoft. If I make an english "i" uppercase an english "I" should come out ALWAYS. If I make a turkish "ı" uppercase a turkish "İ" should come out. Anything else does not make sense and produces a lot of problems. When I have a 100% english text and make it uppercase there should ALWAYS an english text come out without any turkish letters inside. I cannot understand how Microsoft made such a big design error. – Elmue Dec 05 '21 at 13:38
16

Start with MSDN

http://msdn.microsoft.com/en-us/library/system.string.toupperinvariant.aspx

The ToUpperInvariant method is equivalent to ToUpper(CultureInfo.InvariantCulture)

Just because a capital i is 'I' in English, doesn't always make it so.

CaffGeek
  • 21,856
  • 17
  • 100
  • 184
3

ToUpperInvariant uses the rules from the invariant culture

d219
  • 2,707
  • 5
  • 31
  • 36
taylonr
  • 10,732
  • 5
  • 37
  • 66
1

there is no difference in english. only in turkish culture a difference can be found.

Stefanvds
  • 5,868
  • 5
  • 48
  • 72
  • 15
    And you're sure that Turkish is the only culture in the world that has different rules for upper-case than English? I find that hard to believe. – Joel Mueller Aug 23 '10 at 17:59
  • 4
    Turkish is the most often used example, but not the only one. And it's the language, not the culture that has four different I's. Still, +1 for Turkish. – Armstrongest Aug 23 '10 at 18:01
  • sure there must be some others. most ppl will never ever meet those languages in programming anyway – Stefanvds Aug 23 '10 at 18:11
  • 9
    Sure they will. Web Applications are open to the globe and it's good to set your parameters. What if you're operating on a legacy database that doesn't do unicode? What characters will you accept as a username? What if you have to put in Customer names into a Legacy ERP built on COBOL? Lots of cases where the culture is important. Not to mention, dates and numbers. 4.54 is written 4,54 in some languages. Pretending those other languages don't exist won't get you very far in the long run. – Armstrongest Aug 23 '10 at 18:30
  • obviously cultures are important for dates and numbers, i'm just telling most ppl will never meet the languages which have a different result in toUpper and toUpperInvariant. – Stefanvds Aug 24 '10 at 07:12
  • @Stefanvds FWIW I just ran into this issue converting musical degrees read in from "i" to "I". Not saying it happens often, but it *does* happen :) – Josh Noe Jul 03 '18 at 15:37
  • Spanish, second native language in the world (after chinese) has different upper-case, like África – Leandro Bardelli Jun 07 '21 at 15:15
  • @LeandroBardelli It seems that "á".ToUpperInvariant() == "Á" and "ñ".ToUpperInvariant() == "Ñ" so could you please explain what you mean when you say that Spanish has different uppercase. – Edminsson Oct 03 '21 at 09:23