Converting superscript to Unicode in C#

Question

How can I convert a superscript number to unicode in C#? I have many numbers in superscript that I want to convert, so I would make a loop, but I cant figure out how to convert them to unicode.

Example superscript numbers: ⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ¹⁰ ¹¹ ¹² ¹³ ¹⁴ ¹⁵ ¹⁶

Thanks

Where are these “superscript numbers” currently stored? A database? A file? A string in your program? — Wearwolf, Apr 04 '18 at 16:16
They will be taken from a website and stored in a string. I will check if the string contains superscripts, so I thought the best way was to convert them to unicode and compare them with another string array that contains all unicode of superscripts(until 40). — Taqwa, Apr 04 '18 at 16:18
`"¹⁶"` is a two-character string, and both characters are part of Unicode. Do you have that string in C# already and do you mean converting that to the two-character string `"16"`, or is your question about getting data in an unknown encoding to end up as the correct characters in C#? "Converting to Unicode" is not really a meaningful operation. "Encoding in UTF-8" could be. — Jeroen Mostert, Apr 04 '18 at 16:20
I just want to check if the string I take from the website contains superscripts in it. — Taqwa, Apr 04 '18 at 16:23
Do you just need a string which you already have. Just put double quotes around numbers. String in Net are two byte object just like unicode. — jdweng, Apr 04 '18 at 16:26
Strings in C# are always Unicode (UTF-16) so if you have the superscript numbers in a C# string they are already Unicode. If the C# string you get by reading these values doesn't contain what you expect then it's likely an encoding issue. — Wearwolf, Apr 04 '18 at 16:27
You could simply use `String.IndexOfAny`, or a regex. Superscript digits are no different from other characters. — Jeroen Mostert, Apr 04 '18 at 16:27
Possible duplicate of [How to find the unicode of the subscript alphabet?](https://stackoverflow.com/questions/17908593/how-to-find-the-unicode-of-the-subscript-alphabet) — Heretic Monkey, Apr 04 '18 at 17:34

score 1 · Accepted Answer · answered Apr 04 '18 at 17:12

C# strings are always Unicode (UTF-16) so if you can load the text without issue it is already Unicode. If you aren't getting the text you expect then you need to look into encodings and how you are reading the text.

Based on Unicode subscripts and superscripts superscripts aren't in a continuous block which makes them difficult to detect. The easiest way to see if you have a superscript it therefore to use a switch statement.

    static bool IsSuperscript(char c)
    {
        switch(c)
        {
            case '⁰':
            case '¹':
            case '²':
            case '³':
            case '⁴':
            case '⁵':
            case '⁶':
            case '⁷':
            case '⁸':
            case '⁹':
                return true;
            default:
                return false;
        }
    }

Then to see if a string contains only superscript characters you just need to loop through it.

    static bool IsSuperscript(string s)
    {
        foreach(var c in s)
        {
            if(!IsSuperscript(c))
            {
                return false;
            }
        }

        return true;
    }

If you want to convert a superscript character into a normal number character you can use a similar switch statement.

    static bool TryNormalizeSuperscript(char superC, out char c)
    {
        bool result = true;
        switch (superC)
        {
            case '⁰':
                c = '0';
                break;
            case '¹':
                c = '1';
                break;
            case '²':
                c = '2';
                break;
            case '³':
                c = '3';
                break;
            case '⁴':
                c = '4';
                break;
            case '⁵':
                c = '5';
                break;
            case '⁶':
                c = '6';
                break;
            case '⁷':
                c = '7';
                break;
            case '⁸':
                c = '8';
                break;
            case '⁹':
                c = '9';
                break;
            default:
                c = '\0';
                result = false;
                break;
        }

        return result;
    }

and loop

    static string NormalizeSuperscript(string s)
    {
        var sb = new StringBuilder();
        foreach (var superC in s)
        {
            if(TryNormalizeSuperscript(superC, out char c))
            {
                sb.Append(c);
            }
            else
            {
                break;
            }
        }

        return sb.ToString();
    }

Note that this loop stops at the first non-superscript character it finds. Depending on your use case that may need to change.

Example usage:

    static void Main(string[] args)
    {
        Console.OutputEncoding = System.Text.Encoding.Unicode;
        var superscripts = "⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ¹⁰ ¹¹ ¹² ¹³ ¹⁴ ¹⁵ ¹⁶ 17 18 19 XX XXI XXII XXIII XXIV";
        foreach(var superscript in superscripts.Split(' '))
        {
            Console.WriteLine($"{superscript} ({IsSuperscript(superscript)}) -> {NormalizeSuperscript(superscript)}");
        }
    }

Outputs:

⁰ (True) -> 0 ¹ (True) -> 1 ² (True) -> 2 ³ (True) -> 3 ⁴ (True) -> 4 ⁵ (True) -> 5 ⁶ (True) -> 6 ⁷ (True) -> 7 ⁸ (True) -> 8 ⁹ (True) -> 9 ¹⁰ (True) -> 10 ¹¹ (True) -> 11 ¹² (True) -> 12 ¹³ (True) -> 13 ¹⁴ (True) -> 14 ¹⁵ (True) -> 15 ¹⁶ (True) -> 16 17 (False) -> 18 (False) -> 19 (False) -> XX (False) -> XXI (False) -> XXII (False) -> XXIII (False) -> XXIV (False) ->

Note that the Console.OutputEncoding = System.Text.Encoding.Unicode; is required to get the console to show the correct characters. I also had to play with console fonts to get things to display correctly.

Converting superscript to Unicode in C#

1 Answers1