3

I have 2 text files:

$ cat /tmp/test1
"AAP" bar
"AEM" bar
"AA" bar
"AEO" bar
"A" bar

$ cat /tmp/test2
"AEM" foo
"AAP" foo
"A" foo
"AEO" foo
"AA" foo

I want to sort them

$ sort /tmp/test1
"AA" bar
"AAP" bar
"A" bar              <-- "A" is in position 3
"AEM" bar
"AEO" bar

$ sort /tmp/test2
"AA" foo
"AAP" foo
"AEM" foo
"AEO" foo
"A" foo              <-- "A" is in position 5

Why does "A" end up in position 3 in /tmp/test1 and in position 5 in /tmp/test2?

My expectation is that each character per column will be compared.

As such, when comparing column 3, 'A', 'E' and '"' will be compared against each other, and this would be the ultimate determinant in the final sort order of this test data.

Clearly my expectation is wrong, so how does sort work, if not in the way I expected?

Is there command line option to sort or some other utility I can use to get the sort order I desire?

Steve Lorimer
  • 27,059
  • 17
  • 118
  • 213

1 Answers1

5

By default sort will work on the whole line and do so in a locale specific manner, with some locales ignoring some characters (the quote and space in your case). To figure out what's going on, try the --debug option. I.E. compare and contrast:

sort --debug /tmp/test[12]
LC_ALL=C sort --debug /tmp/test[12]
sort --debug -k1,1 /tmp/test[12]
LC_ALL=C sort --debug -k1,1 /tmp/test[12]

BTW, you can add a -s option to avoid the last resort sort to simplify the --debug output.

pixelbeat
  • 30,615
  • 9
  • 51
  • 60