3

Consider this code fragment:

var sorted = new[] { "-1.0", "0.0", "1.0", "1.1", "2.0" }
    .OrderBy (s => s)
    .ToArray ();
Console.WriteLine (string.Join (", ", sorted));

On my system this prints

0.0, 1.0, -1.0, 1.1, 2.0

Given that the Ascii code of - is lesser than the numbers' ascii codes I expected -1.0, 0.0, 1.0, 1.1, 2.0.

It definitely confuses me why -1.0 is between 1.0 and 1.1. Those two start with the same char, so anything between them should start with a 1, too.

I vaguely suspect a culture or locale setting to affect this, but mine (mixture of some German and a lot of English) should probably be no different from English or Invariant in the above case.

mafu
  • 31,798
  • 42
  • 154
  • 247
  • 1
    You can find the answer yourself by calling the comparer method directly (i.e. the default comparer for `string`). Have you bothered to investigate this in _any_ way? What have you done so far? Your question seems fairly lazy to me. Suspicions are no match for actually digging in and looking at what's going on. – Peter Duniho Mar 26 '17 at 19:27
  • Maybe [the documentation for `OrderBy`](https://msdn.microsoft.com/en-us/library/bb534966(v=vs.110).aspx) can help? – Uwe Keim Mar 26 '17 at 19:28
  • 3
    All of this is explained in the documentation. Look at what method gets called to do the ordering (`CompareTo`) and look at the remarks stated. Surely that should be the first course of action rather than asking on SO. – Jeroen Vannevel Mar 26 '17 at 19:30
  • @PeterDuniho if everyone was going to dig through the documentation or the code that causes unusual behaviour, this site would have a lot less traffic. – MrPaulch Mar 26 '17 at 20:22
  • @MrPaulch: _"this site would have a lot less traffic"_ -- Maybe, but so what? Traffic for traffic's sake is pointless. And frankly, Stack Overflow has way too much low-quality traffic, which makes it that much harder to search for and find answers, as well as to find the questions worth answering. – Peter Duniho Mar 27 '17 at 00:15
  • This oughta be a meta discussion. But consider the [most highly voted question](http://stackoverflow.com/questions/7074/what-is-the-difference-between-string-and-string-in-c) under the c# tag. A simple google search would lead to the documentation and the answer to the question. – MrPaulch Mar 27 '17 at 06:25

1 Answers1

8

As per MSDN Docs String.Compare:

Notes to Callers: Character sets include ignorable characters. The Compare(String, Int32, String, Int32, Int32, CultureInfo, CompareOptions) method does not consider these characters when it performs a linguistic or culture-sensitive comparison. To recognize ignorable characters in your comparison, supply a value of CompareOptions.Ordinal or CompareOptions.OrdinalIgnoreCase for the options parameter.

If you add StringComparer.Ordinal it will work as expected.

        var sorted = new[] { "-1.0", "0.0", "1.0", "1.1", "2.0" }
            .OrderBy(s => s, StringComparer.Ordinal)
            .ToArray();
        Console.WriteLine(string.Join(", ", sorted));

So as you can see the - will be completely ignored, meaning that "-1.0" and "1.0" are the same

Also as per MSDN CompareOptions Enumeration Remakrs

The .NET Framework uses three distinct ways of sorting: word sort, string sort, and ordinal sort. Word sort performs a culture-sensitive comparison of strings. Certain nonalphanumeric characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. String sort is similar to word sort, except that there are no special cases. Therefore, all nonalphanumeric symbols come before all alphanumeric characters. Ordinal sort compares strings based on the Unicode values of each element of the string.

Michal Ciechan
  • 13,492
  • 11
  • 76
  • 118
  • Additional reading where a hyphen explicitly gets called out: https://msdn.microsoft.com/en-us/library/35f0x18w(v=vs.110).aspx – Jeroen Vannevel Mar 26 '17 at 19:34
  • `For example, if the following code is run on the .NET Framework 4 or later, a comparison of "animal" with "ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.` Was changed in Framework 4? I could not find a page documenting this. – mafu Mar 26 '17 at 19:40
  • Take a look at the following https://msdn.microsoft.com/en-us/library/system.globalization.compareoptions(v=vs.110).aspx – Michal Ciechan Mar 26 '17 at 19:50