14

[This is the rewrite of a similar question I asked backwards... Sorry for the confusion!]

I'm confused about leading s and the standard sort utility. Consider the contents of myfile:

a
 b
  a

Executing sort -t : myfile yields an unexpected result, at least to me:

a
  a
 b

Does that make sense? <space> should come either before a-z (as is the case in ASCII), or after. In the first case I would expect

  a
 b
a

while in the second case

a
 b
  a

Why, then, does sort seem to apply the -b option (ignore leading s) if when it wasn't included? In fact, to be safe I added the -t option in order to have exactly one field in each line. (According to the POSIX standard, "A field comprises a maximal sequence of non-separating characters and, in the absence of option -t, any preceding field separator." sort myfile yields the same output, which is also unexpected.)

Thanks in advance!

ezequiel-garzon
  • 3,047
  • 6
  • 29
  • 33

2 Answers2

15

It depends on the locale. With

LC_COLLATE=en_US.utf8 sort myfile

I get your unexpected result, and with

LC_COLLATE=C sort myfile

I get your expected result. Also see bash sort unusual order. Problem with spaces?

(I don't know why sort handles -b and -t like this.)

Community
  • 1
  • 1
David Andersson
  • 755
  • 4
  • 9
  • @Ernest: Enough of this! I removed the humility. – David Andersson Aug 24 '11 at 00:07
  • Thank you! As to how sort works under en_US.UTF-8, I can't understand it either... The [colletion chart](http://www.collation-charts.org/opensolaris/opensolaris.2008.05.en_US.UTF-8.html) for en_US.UTF-8 does not have a space between the A's and the B's... – ezequiel-garzon Aug 24 '11 at 06:30
  • Not only space. Try ".", "!", "?", "-", "(", ")", "+" and "~". They are also ignored in human (non-C) locales. Still not exactly like an implied -d option. – David Andersson Aug 24 '11 at 12:08
9
$ sort -t : foo
a
    a
  b
$ env LC_ALL=C sort -t: foo
    a
  b
a

From the man page : * WARNING * The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values.

Rob Parker
  • 503
  • 4
  • 10