16

I want to sort a text file through linux sort, that looks like this

v 1006
v10 1
v 1011

I would expect result like this:

v 1006
v 1011
v10 1

However, using sort, even with all kinds of options, the v10 1 line is still in the middle. Why? I would understand v10 1 being either on top on on the bottom (depending if space character is smaller or bigger than 1), but for what reason it is kept in the middle?

Karel Bílek
  • 36,467
  • 31
  • 94
  • 149

2 Answers2

20

It uses the system locale to determine the sorting order of letters. My guess is that with your locale, it ignores whitespace.

$ cat foo.txt 
v 1006
v10 1
v 1011
$ LC_ALL=C sort foo.txt
v 1006
v 1011
v10 1
$ LC_ALL=en_US.utf8 sort foo.txt
v 1006
v10 1
v 1011
Jo So
  • 25,005
  • 6
  • 42
  • 59
Tatu Lahtela
  • 4,514
  • 30
  • 29
  • 1
    just a small comment, you don't need to put the `;` in `LC_ALL=en_US.utf8 ; sort foo.txt`. There is a difference in behavior though. If you execute `SOME_VAR=foo some_command parameters` you are executing some_command with its parameters, and with the env var SOME_VAR being equal to foo, but just for this command. If after returning to the shell you `echo $SOME_VAR`, it will have the original value, not `foo` – Carlos Campderrós May 06 '11 at 09:46
  • Oh. OK, I get it now. But.... what is the POINT in ignoring the whitespaces at all? – Karel Bílek May 06 '11 at 09:54
  • Why does -k seem to respect whitespace in LC_ALL=en_US.utf8? e.g. LC_ALL=en_US.utf8 sort -k1,1 foo.txt gives the same behavior as LC_ALL=C sort foo.txt – Featherlegs Oct 27 '16 at 16:49
4

Your locale influences how the lines are sorted. For example I get this with my current locale:

% echo -e "v 1006\nv10 1\nv 1011" | sort
v 1006
v10 1
v 1011

But with C locale I get this:

% echo -e "v 1006\nv10 1\nv 1011" | LC_ALL=C sort
v 1006
v 1011
v10 1

I'm not sure why it behaves that way really. LC_ALL=C is pretty much equivalent to turning off all unexpected processing and going back to the byte-level operations (yeah - I'm skipping the details).

Why do different locale settings skip space is harder to explain though. If anyone can explain that would be good :)

viraptor
  • 33,322
  • 10
  • 107
  • 191