8

Given documentation for string.StartsWith and this snippet (targeting .net core 2.x):

This method compares the value parameter to the substring at the beginning of this string that is the same length as value, and returns a value that indicates whether they are equal. To be equal, value must be an empty string (String.Empty), must be a reference to this same instance, or must match the beginning of this instance. This method performs a comparison using the specified casing and culture.
https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=netcore-2.1

static void Main(string[] args)
    {
        var unicodeCtrl = "\u0000";
        var str = "x";
        Console.WriteLine($"Is it empty     => {unicodeCtrl == string.Empty}");
        Console.WriteLine($"Lenghts         => {str.Length} {unicodeCtrl.Length}");
        Console.WriteLine($"Are they equal  => {str == unicodeCtrl}");
        Console.WriteLine($"Are they ref eq => {Object.ReferenceEquals(str, unicodeCtrl)}");
        Console.WriteLine($"Contains        => {str.Contains(unicodeCtrl)}");
        Console.WriteLine($"Starts with     => {str.StartsWith(unicodeCtrl)}");
    }

It produces expected result on Windows:

Is it empty     => False  
Lenghts         => 1 1
Are they equal  => False  
Are they ref eq => False  
Contains        => False  
Starts with     => False

but when run on Linux (via docker) the result is:

Is it empty     => False
Lenghts         => 1 1
Are they equal  => False
Are they ref eq => False
Contains        => False
Starts with     => True

Would you consider this a bug?
Platform dependent behavior?

Please note I'm not asking how to make it work (change to str.StartsWith(unicodeCtrl,StringComparison.OrdinalIgnoreCase)) but rather if you believe this is intended/correct behavior.

Edit: I tried to match my local locale on Linux, but it did not make a difference. I tried default C (en-US-POSIX) and pl_PL.UTF8

wmz
  • 3,645
  • 1
  • 14
  • 22
  • 4
    You might want to report this on https://github.com/dotnet/coreclr – poke Sep 18 '18 at 22:39
  • ouch I posted a wrong version of code - with `StringComparison.OrdinalIgnoreCase` - this version works. The one without does not (snippet corrected). – wmz Sep 18 '18 at 22:45
  • Since it works with `OrdinalIgnoreCase`, your active culture settings would be very interesting to know for both machines you are testing this on. – poke Sep 18 '18 at 22:47
  • @poke my local is pl-PL, linux with vanilla dockerfile it was not reporting anything... I added this https://stackoverflow.com/questions/28405902/how-to-set-the-locale-inside-a-ubuntu-docker-container and it now reports `en-US-POSIX` (and starts with is still true) – wmz Sep 18 '18 at 23:01
  • both `"abc".StartsWith("\u0000abc\u0000")` and `"abc".EndsWith("\u0000abc\u0000")` return true, as does `"abc".StartsWith("a\u0000bc")` while `"abc".StartsWith("a\u0000bx")` does not. Definitely odd, but it may be that the posix culture is stripping out null characters to avoid terminating strings prematurely outside of the ordinal comparison case-pure speculation btw. – Jonathon Chase Sep 18 '18 at 23:10

1 Answers1

5

This is a known difference between Windows and Linux/Unix: on Unix platforms, nulls have no "weight". The behavior of .NET here is By Design, to match platform expectations, rather than to provide consistency. If you want the nulls to "count", you'll have to use an ordinal comparison.

See here: https://github.com/dotnet/coreclr/issues/2051#issuecomment-277005422

And here: https://github.com/dotnet/corefx/pull/29935/files#diff-91724393075e1a7718d3521655506742R1399

jazzdelightsme
  • 457
  • 3
  • 14
  • that's something I suspected. What throws me off are two things: 1. it works the same with unicodeCtrl = "\u0002" 2. does not seem to be universal: python "xx".startswith(chr(0)) => False – wmz Sep 19 '18 at 06:02