0

On my local machine, this:

Trace.Warn("\u012b");

Outputs this (which is wrong):

ī

Yet, on another machine, (eg. here: http://www.volatileread.com/UtilityLibrary/SnippetCompiler):

Console.WriteLine("\u012b");

Outputs:

i

What's happening?

EDIT: I'm using this function from here: Any libraries to convert number Pinyin to Pinyin with tone markings?

public static string ConvertNumericalPinYinToAccented(string input)
{
    Dictionary<int, string> PinyinToneMark = new Dictionary<int, string>
    {
        {0, "aoeiuv\u00fc"},
        {1, "\u0101\u014d\u0113\u012b\u016b\u01d6\u01d6"},
        {2, "\u00e1\u00f3\u00e9\u00ed\u00fa\u01d8\u01d8"},
        {3, "\u01ce\u01d2\u011b\u01d0\u01d4\u01da\u01da"},
        {4, "\u00e0\u00f2\u00e8\u00ec\u00f9\u01dc\u01dc"}
    };

    string[] words = input.Split(' ');
    string accented = "";
    string t = "";
    foreach (string pinyin in words)
    {
        foreach (char c in pinyin)
        {
            if (c >= 'a' && c <= 'z')
            {
                t += c;
            }
            else if (c == ':')
            {
                if (t[t.Length - 1] == 'u')
                {
                    t = t.Substring(0, t.Length - 2) + "\u00fc";
                }
            }
            else
            {
                if (c >= '0' && c <= '5')
                {
                    int tone = (int)Char.GetNumericValue(c) % 5;

                    if (tone != 0)
                    {
                        Match match = Regex.Match(t, "[aoeiuv\u00fc]+");
                        if (!match.Success)
                        {
                            t += c;
                        }
                        else if (match.Groups[0].Length == 1)
                        {
                            t = t.Substring(0, match.Groups[0].Index) +
                                PinyinToneMark[tone][PinyinToneMark[0].IndexOf(match.Groups[0].Value[0])]
                                + t.Substring(match.Groups[0].Index + match.Groups[0].Length);
                        }
                        else
                        {
                            if (t.Contains("a"))
                            {
                                t = t.Replace("a", PinyinToneMark[tone][0].ToString());
                            }
                            else if (t.Contains("o"))
                            {
                                t = t.Replace("o", PinyinToneMark[tone][1].ToString());
                            }
                            else if (t.Contains("e"))
                            {
                                t = t.Replace("e", PinyinToneMark[tone][2].ToString());
                            }
                            else if (t.Contains("ui"))
                            {
                                t = t.Replace("i", PinyinToneMark[tone][3].ToString());
                            }
                            else if (t.Contains("iu"))
                            {
                                t = t.Replace("u", PinyinToneMark[tone][4].ToString());
                            }
                            else
                            {
                                t += "!";
                            }
                        }
                    }
                }
                accented += t;
                t = "";
            }
        }
        accented += t + " ";
    }
    accented = accented.TrimEnd();
    return accented;
}

Eg.: ConvertNumericalPinYinToAccented("ba2itia1n"); Working version: http://volatileread.com/utilitylibrary/snippetcompiler?id=22734

Community
  • 1
  • 1
Cornwell
  • 3,304
  • 7
  • 51
  • 84
  • What you call *wrong* just is not wrong, U+010B is "Latin small letter I with macron", that's what it looks like. Consoles don't normally have a font that can display this glyph, you get the "closest". – Hans Passant Jul 16 '15 at 18:42

1 Answers1

1

On this link there is an answer that might be usefull to you. https://superuser.com/questions/412986/unicode-support-between-different-os-and-browsers

The unicode interpretation depends on the Browser you use and the OS that the server is running on. Knowing that, it is normal that a small difference appears.

Community
  • 1
  • 1
Daneau
  • 1,085
  • 9
  • 15
  • I thought it was a standardized format, not open to different interpretations. How can I "force" my server to follow one interpretation? – Cornwell Jul 16 '15 at 16:46
  • @Cornwell same character can be displayed in many ways/different glyphs (including "no character found" shown as square). Depending on your needs showing hex value of character/sequence (possibly after normalization :) ) may be the solution. – Alexei Levenkov Jul 16 '15 at 17:02
  • @AlexeiLevenkov How would I go about normalizing it? Please see my edit – Cornwell Jul 16 '15 at 17:08
  • @Cornwell http://stackoverflow.com/questions/3288114/what-does-nets-string-normalize-do ... Looking at the sample you probably need to spend some time reading posts http://www.siao2.com/ (or generally include "Kaplan" in your search queries). – Alexei Levenkov Jul 16 '15 at 18:38