1

Given a ordered list:

var lst = new []{"1","10","2","b","ab"};
var lst2 = lst.OrderBy(c => c);

Result of lst2:

1, 10, 2, ab, b

Why does String.Compare() not also measure on the length of the string when it compares? I would have thought that the result would be more like this:

1, 2, 10, b, ab

Because 10 (something(1) and somthing(0)) should be after 2 (something(2) and nothing)

Could anybody give a good reason for this?

Jens Kloster
  • 11,099
  • 5
  • 40
  • 54
  • 2
    While i can understand 2 before 10 (natural sort), i can't figure out why b is before ab. – Vladimir Apr 23 '13 at 08:46
  • You can do `var lst2 = lst.OrderBy(c => c.Length).ThenBy(c => c);` but it will not give what you want. – Johan Larsson Apr 23 '13 at 08:47
  • @VladimirFrolov: It's not. Copy/paste error probably. – Jon Apr 23 '13 at 08:48
  • 1
    @JensKloster: The behavior you want is called [natural order sorting](http://stackoverflow.com/questions/248603/natural-sort-order-in-c-sharp) and while it may be natural to humans, it is unnatural if you approach sorting mathematically. – Jon Apr 23 '13 at 08:50
  • 1
    @Jon There doesn't seem anything natural about putting `b` before `ab`... It would make using a dictionary fun. :) – Matthew Watson Apr 23 '13 at 08:57
  • 1
    @VladimirFrolov: The OP talks about taking the string length into account when sorting. length("b") < length("ab"), hence b comes before ab. – Daniel Hilgarth Apr 23 '13 at 09:00

3 Answers3

10

If the world uses your sorting algorithm, how would a phone book look like?

  • Anna
  • Berta
  • Annamarie
  • Beatrix

String-comparing is based on first compare the first letter (ot better: character), if they are equal the second character, etc. It is not based on length of the word.

Martin Mulder
  • 12,642
  • 3
  • 25
  • 54
6

A string is a set of characters.
When comparing strings, it basically is a set comparision, i.e., the first character of both strings are compared. Only if they are the same are the next characters compared etc.

When correctly aligning your list of unordered strings by their first character, this becomes obvious:

"1"
"10"
"2"
"b"
"ab"

After ordering, the result will be:

"1"
"10"
"2"
"ab"
"b"

Reasons:

  • "2" will come after "1", because '2' > '1'.
  • "2" will come after "10", because, again, '2' > '1'. The '0' in "10" is not taken into account, because the comparison of the first characters already results in an unambiguous result.
  • "ab" will come after "2", because 'a' > '2'
  • "b" will come after "ab", because 'b' > 'a'. The 'b' in "ab" is not taken into account, because the comparison of the first characters already results in an unambiguous result.

If you want have the numbers in the strings ordered the way you want, you may want to look into "Natural Sort".
The ordering of your strings makes no sense, so you probably would have to build that yourself.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
0

Why would 10 be after 2?

If you order words using alphabetical order the following words: x y xy, then the result is x xy y.

For number it's exactly the same with an extended set of characters.

ken2k
  • 48,145
  • 10
  • 116
  • 176