14

When I run the following code in .NET Core 3.1, I get 6 as the return value.

// .NET Core 3.1
string s = "Hello\r\nworld!";
int idx = s.IndexOf("\n");
Console.WriteLine(idx);

Result:

6

But when I run this code in .NET 5.0, I get a different result. Why does this happen?

// .NET 5.0
string s = "Hello\r\nworld!";
int idx = s.IndexOf("\n");
Console.WriteLine(idx);

Result:

-1
iFarbod
  • 639
  • 11
  • 26
Farhad Zamani
  • 5,381
  • 2
  • 16
  • 41
  • 1
    Take a look at [this issue](https://github.com/dotnet/runtime/issues/43736) – Riwen Nov 14 '20 at 12:03
  • 1
    Because of [globalization and ICU](https://learn.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu) – Arthur Attout Nov 14 '20 at 12:03
  • 8
    Is this example an exact copy of https://learn.microsoft.com/en-us/dotnet/standard/globalization-localization/globalization-icu ? I don't understand the question then, as it's explained right there on why and how to revert to the old behavior. – Ray Nov 14 '20 at 12:05
  • 2
    It all depends by the CultureInfo used. It is a discrepancy well documented on the NetCore GitHub repo. – bre_dev Nov 14 '20 at 12:21
  • I looked at your question and went "That's ridiculous. Clearly Farhad is doing something wrong." and then I read the issue Riwen linked. I foresee a lot of people updating to .NET 5 without being aware of this. Oh dear. – ProgrammingLlama Nov 14 '20 at 12:30
  • 5
    @John: Actually using `IndexOf` without specifying a comparison has always been a time bomb. And may fail even with pre-.NET 5 if used with different regional settings. I wish I knew whose idea was to default to current culture instead of using ordinal comparison... Same with `ToString` and `Parse` without specifying a culture. Almost every floating-point formatting/parsing question on SO is related to this decision problem. – György Kőszeg Nov 14 '20 at 12:54
  • @GyörgyKőszeg That's as may be, but I still foresee people getting caught out by it. – ProgrammingLlama Nov 14 '20 at 15:30

2 Answers2

19

The comments and @Ray's answer contain the reason.

And though hacking the .csproj or runtimeconfig.json file may save your day the real solution is to specify the comparison explicitly:

// this returns the expected result
int idx = s.IndexOf("\n", StringComparison.Ordinal);

For some reason IndexOf(string) defaults to use current culture comparison, which can cause surprises even with earlier .NET versions when your app is executed in an environment that has different regional settings than yours.

Using a culture-specific search is actually a very rare scenario (can be valid in a browser, book reader or UI search, for example) and it is much slower than ordinal search.

The same issue applies for StartsWith/EndsWith/Contains/ToUpper/ToLower and even ToString and Parse methods of formattable types (especially when using floating-point types) as these also use the current culture by default, which can be the source of many gotchas. But recent code analyzers (eg. FxCop, ReSharper) can warn you if you don't use a specific comparison or culture. It is recommended to set a high severity for these issues in a product code.

György Kőszeg
  • 17,093
  • 6
  • 37
  • 65
6

Your sample code exactly matches the one posted on MSDN which also describes the why and how to revert to the old behavior in these excerpts (emphases mine):

In the past, the .NET globalization APIs used different underlying libraries on different platforms. On Unix, the APIs used International Components for Unicode (ICU), and on Windows, they used National Language Support (NLS). [...] Behavior differences were evident in these areas:

  • Cultures and culture data
  • String casing
  • String sorting and searching
  • Sort keys
  • String normalization
  • Internationalized Domain Names (IDN) support
  • Time zone display name on Linux

To revert back to using NLS [as relevant for Windows 10 May 2019 Update and newer which now uses ICU by default], a developer can opt out of the ICU implementation. Applications can enable NLS mode in any of the following ways:

  • In the project file:

    <ItemGroup>
      <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />
    </ItemGroup>
    
  • In the runtimeconfig.json file:

    {
      "runtimeOptions": {
         "configProperties": {
           "System.Globalization.AppLocalIcu": "<suffix>:<version> or <version>"
         }
      }
    }
    
  • By setting the environment variable DOTNET_SYSTEM_GLOBALIZATION_APPLOCALICU to the value <suffix>:<version> or <version>.

    <suffix>: Optional suffix of fewer than 36 characters in length, following the public ICU packaging conventions. When building a custom ICU, you can customize it to produce the lib names and exported symbol names to contain a suffix, for example, libicuucmyapp, where myapp is the suffix.

    <version>: A valid ICU version, for example, 67.1. This version is used to load the binaries and to get the exported symbols.

For more / up-to-date information, please refer to the MSDN link above.

However, I recommend reading up on György Kőszeg's answer aswell, as you'd only have to worry about these details from inexact string operations to begin with.

Ray
  • 7,940
  • 7
  • 58
  • 90