21

When I use this code:

var ri = new RegionInfo("us");
var nativeName = ri.NativeName;   // ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ

why is nativeName then the string "ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ" (in Cherokee)?

If I change to new RegionInfo("US") (only difference, capital US), I get instead "United States".

I do know the preferred usage of RegionInfo is to give a specific culture info string such as:

new RegionInfo("en-US")
new RegionInfo("chr-Cher-US")

and so on, and that works. But why is Cherokee preferred over English only if I use lower-case us?


(Seen on Windows 10 (version 1803 "April 2018 Update"), .NET Framework 4.7.2.)


Update: This is not consistent, even on the same machine. For example I tried opening PowerShell very many times, each time pasting [System.Globalization.RegionInfo]'US' into it. It seems like for a long period, all instances of PowerShell are consistently giving the same result. But then after a while, the instances of PowerShell then give the opposite result. Here is a screenshot of two of the windows, one consistently having one NativeName, and the other one consistently having the opposite one. So there must be some non-deterministic determination going on (no difference in casing):

PowerShell windows

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • Could be a bug. The [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.globalization.regioninfo.-ctor?view=netframework-4.7.2#System_Globalization_RegionInfo__ctor_System_String_) says "Case is not significant." Of course, it also says, "You should provide the name of a specific culture rather than just a country/region name in the name parameter." – Heretic Monkey Nov 13 '18 at 16:35
  • 4
    Even with `US` I get `ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ` in Linqpad – Panagiotis Kanavos Nov 13 '18 at 16:35
  • 10
    To be fair, the Cherokee were in the US before it was the US :). – Heretic Monkey Nov 13 '18 at 16:37
  • From the [docs](https://learn.microsoft.com/en-us/dotnet/api/system.globalization.regioninfo.nativename?view=netframework-4.7.2): *We recommend that you use the culture name ... Therefore, creating the `RegionInfo` object with only a country/region name of US is not specific enough to distinguish the appropriate string.* – DavidG Nov 13 '18 at 16:44
  • @DavidG: Yeah, so is the fact that it's able to determine the appropriate region with uppercase US an accident? The main documentation of RegionInfo makes it clear that uppercase US works correctly. – BoltClock Nov 13 '18 at 16:47
  • Looks like [.NET just delegates to the OS](https://referencesource.microsoft.com/#mscorlib/system/globalization/cultureinfo.cs,ad25d10813fc68ee,references), so it's a Windows 10 thing. – Heretic Monkey Nov 13 '18 at 16:48
  • @BoltClock AFAICT the docs are pretty explicit to avoid the two-letter code. The [RegionInfo](https://docs.microsoft.com/en-us/dotnet/api/system.globalization.regioninfo?view=netframework-4.7.2#instantiating-a-regioninfo-object) class, the [`NativeName`](https://docs.microsoft.com/en-us/dotnet/api/system.globalization.regioninfo.nativename?view=netframework-4.7.2) property and the [constructor](https://docs.microsoft.com/en-us/dotnet/api/system.globalization.regioninfo.-ctor?view=netframework-4.7.2#System_Globalization_RegionInfo__ctor_System_String_). Is there another doc that needs editing? – DavidG Nov 13 '18 at 16:52
  • 2
    @DavidG: Those documents don't suggest anything like the behavior being undefined, unpredictable or unsupported when a two-letter code is provided. They're just advising developers to provide the culture name for best results and handwaving it otherwise. If this behavior is intentional or otherwise not a bug, there must be a reason for it... – BoltClock Nov 13 '18 at 16:57
  • @PanagiotisKanavos It seems quite random. On another machine, when in PowerShell I do `[System.Globalization.RegionInfo]'us'` and `[System.Globalization.RegionInfo]'US'`, it is opposite of what you report, `United States` in both cases. – Jeppe Stig Nielsen Nov 13 '18 at 17:15
  • I'm going to go with "easter egg." – 3Dave Nov 13 '18 at 21:52
  • This question should be migrated to https://history.stackexchange.com/ . (Or... wait.. what?) – Marco13 Nov 13 '18 at 22:31
  • @JeppeStigNielsen I updated my answer to add info about the caching it uses, which seems to affect the consistency. – Gabriel Luci Nov 14 '18 at 02:32

1 Answers1

8

The first thing to note is that the constructor for RegionInfo finds the region by finding a culture used in that region. So it's looking for a language in that country, not just the country.

Reading through that source code, it seems like the difference in upper/lower case is because of how the lookups are done if no culture is specified with the region.

For example, it tries a couple things first, but then it will try to look in a static list of regions. But because it's using Dictionary.ContainsKey, it's a case-sensitive search. So if you specify "US", it will find it, but not "us".

Later, it searches through all the cultures (from CultureInfo.GetCultures(CultureTypes.SpecificCultures)) for the region you gave, but it does so in a case-insensitive way.

I can't confirm since I can't step through that code, but my guess is that, because it's going through the list in order, it will get to chr-Cher-US before it gets to en-US.

Why is it not consistent?

One of the comments said that LinqPad finds Cherokee even when using upper case. I don't know why this is. I was able to replicate that, but I also found that in Visual Studio, it's English when using "US" and Cherokee when using "us", like you describe. But I did find that if I turn on "Use experimental Roslyn assemblies" in LinqPad, then it returns English for both "US" and "us". So maybe it has something to do with the exact runtime version targetted, I can't say for sure.

One thing that affects consistency is caching: the first thing that it will do when it does not get a complete match by culture + region is check a cache of already-found cultures. It lower-cases all the keys in that cache, so this cache is case-insensitive.

You can test this. We know that using "US" vs. "us" will yield different results, but try this in the same program:

var nativeNameus = new RegionInfo("us").NativeName;
var nativeNameUS = new RegionInfo("US").NativeName;

Then swap them and run it again:

var nativeNameUS = new RegionInfo("US").NativeName;
var nativeNameus = new RegionInfo("us").NativeName;

Both results will always be equal because the first culture is cached and used for the next.

It's possible that there is code outside of your code that calls the same methods and ends up caching a culture value, thereby changing the result you get when you do the same.

Conclusion

All that said, the docs actually say:

We recommend that you use the culture name—for example, "en-US" for English (United States)—to access the NativeName property.

So it is a bit of a moot point: you asked for a region, not a language. If you need a specific language, ask for that language, not just a region.

If you want to guarantee English, then either:

  1. Do as Microsoft recommends and specify the language with the region: "en-US", or
  2. Use the EnglishName or DisplayName properties (which are English even when the NativeName is Cherokee).
Gabriel Luci
  • 38,328
  • 4
  • 55
  • 84
  • 1
    The reason why some people get `ᏌᏊ ᎢᏳᎾᎵᏍᏔᏅ ᏍᎦᏚᎩ` and some `United States` is most probably, because they target different .NET Framework versions. If you compile OP code against .NET Framework 3.5 or lower it will print `United States`. "chr-Cher-US" was probably added in later versions of .NET Framework and that's why "en-US" is found first in dictionary. – FCin Nov 13 '18 at 17:36
  • This comment in the RegionInfo constructor says it all `Note: We prefer that a region be created with a full culture name (ie: en-US) because otherwise the native strings won't be right.` – bastos.sergio Nov 13 '18 at 17:52
  • 1
    @FCin It is apparently very device dependent. On my machine I'm reliably getting only "United States" for both `us` and `US` on FW3.5, and only Cherokee for both `us` and `US` with FW4.0 and up. – GSerg Nov 13 '18 at 18:32
  • @GSerg Why do you say it's *device* dependent? Changing the framework version would make it framework version dependent. –  Nov 13 '18 at 18:36
  • @Amy Because I'm getting one result with e.g. FW4.5 and other people are getting a different result also with FW4.5. I'm not saying it is necessarily *device* dependent (as in, hardware), but it is evidently not limited to just .NET version. – GSerg Nov 13 '18 at 18:37
  • Sure, but ALL people involved are just using the wrong parameters.... A culture information is always in the format "en-us" or "fr-ca" or "sp-mx" etc... Just specifying a country leaves it up to the OS to decide what to spit back, depending on how the list of cultures for the selected country are sorted internally. The fact that it worked "reliably" before was just dumb luck that the result you were looking for was on top. Use culture AND country to avoid issues. If you just want a region-generic version, use the language code alone (for example "en" instead of "en-us"). – Drunken Code Monkey Nov 13 '18 at 21:18
  • I was giving this some extra thought (because I'm like that) and remembered I saw some caching going on in the code. So I did some tests and it does indeed affect the consistency of this. I've updated my answer. But my conclusion still stands: if you need a specific language, ask for that language. – Gabriel Luci Nov 14 '18 at 02:35
  • @bastos.sergio's comment is the only one that provides an actual lead. So it looks like the source does acknowledge that the value of NativeName is unsupported when only a region identifier is provided. – BoltClock Nov 14 '18 at 02:54
  • So this method should probably throw exception when only region is specified instead of returning "random" values. Looks like bad implementation to me – FCin Nov 14 '18 at 05:31